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CHAPTER  I 


INTRODUCTION 


1.1  Unknown  Interference  and  Jamming 


Many  communication  systems  have  to  operate  in  the  presence  of  interference 
or  noise  from  other  sources.  Usually  this  noise  is  less  familiar  to  the  communicator 
than  the  thermal  noise  originating  in  the  receiver  and  often  very  little  is  known 
about  the  particular  form  of  this  noise.  As  a  result  communication  is  carried  out 
over  a  fuzzily- known  channel  whose  parameters  depend  on  the  specific  kind  of 
noise  present.  Clearly,  if  communication  is  to  be  carried  out  reliably,  some  loss 
in  the  rate  at  which  this  is  performed  will  have  to  be  suffered  to  guard  against 
the  potentially  bad  channels  that  may  be  encountered.  All  this  is  true  for  a  wide 
variety  of  situations  which  arise  in  practice.  For  example,  multiple  access  noise 
(due  to  transmissions  by  other  users  over  a  common  channel)  is  a  case  in  point. 
Depending  on  the  message  generation  rates  at  the  different  transmitters  the  noise 
may  be  of  unknown  intensity,  of  unknown  duration  and  of  unknown  timing.  An¬ 
other  case  which  is  potentially  more  detrimental  to  reliable  communication  is  when 
there  is  actually  a  hostile  adversary  present,  referred  to  as  a  jammer,  who  works 
at  cross-purposes  with  the  communicator.  The  jammer  is  both  hostile  and  intel¬ 
ligent  in  the  sense  that  he  makes  inimical  use  of  the  system  parameters  that  he 


knows  of.  For  communication  to  be  possible  at  ail,  the  jammer  must  necessarily 
be  constrained.  Similarly,  the  communicator  is  also  subject  to  constraints  in  terms 
of  resources  such  as  power  or  bandwidth.  This  scenario  lends  itself  quite  naturally 
to  a  model  formulation  in  terms  of  two-person  non-cooperative  game  theory  with 
the  communicator’s  strategy  set  incorporating  such  parameters  as  he  has  control 
over  as,  for  instance,  input  signal  energy,  rate  of  information  symbols,  kind  of  en¬ 
coding  and  decoding  used,  kind  of  quantization  used,  and  the  jammer’s  strategy 
set  incorporating  the  noise  parameters  he  introduces  into  the  channel  such  as,  for 
example,  the  noise  variance  or  the  kinds  of  distributions  of  the  noise  random  vari¬ 
ables.  Various  payoff  or  objective  functions  have  also  been  studied  in  the  literature 
such  as  probability  of  error,  mean  square  distortion  or  mutual  information. 

Information-theoretic  analyses  which  address  these  situations  where  the  chan¬ 
nel  parameters  at  the  time  of  transmission  are  not  completely  known  have  devel¬ 
oped  well-studied  channel  models  such  as  the  multiple  access  channel,  the  interfer¬ 
ence  channel,  the  compound  channel,  the  arbitrarily  varying  channel,  and  various 
other  cases  therein.  The  first  two  models  are  suited  to  the  study  of  interference 
originating  from  other  users  who  are  competitive  but  not  necessarily  hostile  in 
intent.  The  theme  of  the  analyses  in  those  cases  is  to  achieve  or  guarantee  some 
kind  of  cooperation  or  coordination  with  limited  knowledge  of  the  transmission 
parameters  of  the  other  users  through  the  use  of  encoders  and  decoders  and  to 
find  out  the  rate  region  in  which  reliable  communication  is  possible.  The  latter 
three  models  can  be  used  to  model  the  kind  of  antagonistic  interference  originating 
from  a  jammer.  For  instance,  the  compound  channel  model  may  be  viewed  as  one 
where  the  jammer  chooses  one  out  of  a  set  of  possible  channels  (by  choosing  one 
out  of  a  set  of  possible  noise  distributions)  and  the  communicator  chooses  one  out 
of  all  the  possible  encoding  and  decoding  strategies  with  the  questions  of  interest 
being  ones  such  as:  what  is  the  maximum  rate  at  which  the  communicator  can 


transmit  information  reliably  irrespective  of  the  particular  noise  distribution  the 
jammer  uses.  The  solution  to  this  problem  [Wolf  78],  [Blac  59],  which  is  the  ca¬ 
pacity  of  this  channel,  is  max  min  I{X\  Y)  (where  dP{ x)  is  the  distribution  of  the 
input  symbols,  C  is  a  channel  selected  by  the  jammer  out  of  the  set  C ,  X  and  Y 
are  the  channel  input  and  output  random  variables  respectively  and  I  is  the 

mutual  information  function).  This  illustrates  the  point  made  earlier  of  a  game 
theoretic  formulation  being  a  natural  one  for  such  problems.  The  solution  clearly 
indicates  that  the  compound  channel  situation  where  rate  of  reliable  transmission 
is  the  objective  and  coding  and  decoding  are  strategies  available  to  the  communi¬ 
cator  may  be  viewed  as  a  two-person  game  where  the  communicator’s  strategy  set 
is  the  set  of  all  probability  distributions  on  the  input  and  the  payoff  is  the  mutual 
information  between  the  input  and  the  output.  The  compound  channel  model  is 
useful  in  modelling  those  kinds  of  jamming  where  the  jamming  is  not  adaptive, 
i.e.  the  jammer  does  not  make  use  of  the  channel  parameters  at  transmission  time 
to  decide  his  subsequent  jamming  strategies.  If  the  jammer  is  sophisticated  and 
adapts  his  strategy  the  appropriate  model  to  analyze  rate  of  reliable  transmission 
questions  is  the  arbitrarily  “star”  varying  channel  [Csiz  81  pg.233].  Here  for  each 
transmission  of  a  channel  symbol  a  new  channel  (one  out  of  some  known  set  of 
channels)  may  be  presented  to  the  communicator  based  on  the  jammer's  knowl¬ 
edge  of  the  previous  (and  maybe,  present)  transmitted  symbols.  Clearly  this  case 
includes  the  compound  channel  model  described  before.  It  also  presupposes  a  jam¬ 
mer  who  is  omniscient  in  that  the  channel  parameters  at  the  time  of  every  channel 
transmission  are  cognizable  by  him  and  utilized  with  malicious  intent  on  the  sub¬ 
sequent  transmission.  Solutions  to  these  problems  may  be  viewed  as  conservative 
in  the  context  of  less  than  omniscient  real  world  jammers.  The  arbitrarily  “star” 
varying  channel  can  also  be  structured  into  a  two-person  zero-sum  game  theoretic 
formulation.  In  Chapter  2  we  will  utilize  the  compound  channel  model  for  all  our 


analysis  and  will  explain  in  the  conclusions  how  the  results  may,  in  many  cases,  be 
extended  to  the  arbitrarily  ‘‘star”  varyiir  channel. 

1.2  Spread  Spectrum  Counter-measures 

The  most  widely  used  counter-measure  to  counteract  the  effect  of  a  jammer  as 
described  above  is  spread-spectrum  modulation.  The  idea  is  for  the  communicator 
to  have  potential  use  of  a  much  larger  “space”  for  transmission  than  ordinarily 
needed  and  use  private  information  (i.e.  information  not  accessible  to  the  jammer) 
to  enable  the  receiver  to  determine  “where”  an  individual  transmission  is  located. 
The  use  of  such  a  strategy  forces  the  jammer  to  either  expend  his  finite  power  over 
the  entire  space  thereby  reducing  his  effectiveness  or  to  use  some  other  strategy 
wherein  the  jammer  may  allow  some  transmissions  to  occur  unimpeded  but  jam 
the  remaining  with  higher  power.  In  fact,  partial-band  jamming  or  pulsed  jamming 
which  exploits  the  latter  idea  can  be  shown  to  be  very  effective  and  does  seriously 
degrade  the  communicator’s  performance. 

Typically  in  a  spread  spectrum  channel  the  performance  in  additive  white  Gaus¬ 
sian  noise  is  identical  to  the  performance  of  non-spread  systems;  namely  the  bit 
error  probability  decreases  exponentially  with  signal-to-noise  ratio.  However,  when 
subject  to  worst-case  partial-band  or  pulsed  jamming  (wherein  power  is  concen¬ 
trated  in  time  or  frequency  to  affect  only  a  fraction  of  the  symbols  transmitted 
while  allowing  the  remaining  to  be  received  “error-free”)  the  bit  error  probability 
of  a  spread-spectrum  system  decreases  only  inverse  linearly  with  the  signal-to-noise 
ratio.  This  is  a  significant  degradation,  typically  of  the  order  of  30-40  dB  for  a  bit 
error  probability  on  the  order  of  10-5. 

To  remedy  this  situation  most  systems  use  some  form  of  error-correction  coding. 
For  example,  it  can  be  shown  that  with  a  hard  decision  decoder  if  the  code  rate 


is  small  (<  1/2)  and  the  jammer  is  allowed  to  pulse  between  several  Gaussian 
distributions  then  there  is  no  loss  in  signal-to-noise  ratio  necessary  for  reliable 
communications  compared  to  an  additive  Gaussian  noise  channel  with  the  same 
(average)  power.  So  it  can  be  said  that  coding  (with  hard  decision  demodulation) 
neutralizes  a  (power  constrained)  jammer  (i.e.,  makes  the  performance  the  same  as 
additive  white  Gaussian  noise).  It  can  also  be  shown  that  the  worst  case  jamming 
strategy  is  to  pulse  between  two  zero  mean  Gaussian  noise  distributions,  one  of 
which  has  zero  variance. 

As  has  been  well  known  in  the  communication  field,  hard  decision  decoding 
loses  roughly  2  dB  in  signal-to-noise  ratio  compared  to  soft  decision  decoding. 
Thus  considerable  interest  has  focused  on  soft-decision  decoding.  One  problem 
that  has  been  observed  is  that  if  a  (soft)  decoding  algorithm  designed  for  a  non- 
jammed  channel  is  used  for  a  jammed  channel  then  the  performance  is  extremely 
poor  when  the  jamming  strategy  is  optimized.  One  method  for  “overcoming”  this 
difficulty  is  to  assume  the  jamming  noise  is  one  of  two  distributions  (usually  one 
having  zero  variance  called  the  “off”  state  and  the  other  called  the  “on”  state)  and 
that  the  decoder  knows  perfectly  when  the  jammer  is  “on”  and  when  the  jammer 
is  “off' .  Using  this  side  information,  similar  results  to  the  hard  decision  case  have 
been  obtained  for  the  soft  decision  case  (for  small  rates  there  is  no  loss  in  per¬ 
formance).  However  assuming  this  information  is  available  is  assuming  away  the 
problem.  Most  systems  analyses  do  not  incorporate  jamming  strategies  that  affect 
the  reliability  of  the  side  information.  A  jamming  strategy  not  usually  considered 
by  such  analyses  is  a  strategy  that  tries  to  minimize  the  reliability  of  the  side 
information. 

Motivated  by  the  improvement  in  performance  of  soft  decisions  over  hard  de¬ 
cisions  many  researchers  have  considered  decoding  algorithms  that  do  not  assume 
side  information  and  do  not  do  hard  decision  decoding.  However,  most  of  these 


algorithms  still  assume  the  jammer  pulses  between  one  of  two  levels.  In  Chapter  2 
we  investigate  the  case  of  a  decoder  that  processes  symbols  from  a  finite  alphabet 
and  where  the  only  constraints  on  the  jammer  are  average  and  peak  power.  We 
formulate  the  problem  as  a  game  with  two  players:  the  jammer  whose  strategy  set 
consists  of  distributions  modulating  the  jamming  noise,  and  the  communicator, 
whose  strategy  set  consists  of  a  pair  of  distributions,  one  on  the  input  alphabet 
and  one  on  a  set  of  quantizers.  We  look  for  worst-case  jamming  strategies  and 
investigate  when  the  game  admits  of  a  saddle  point.  We  do  the  analysis  using 
both  mutual  information  (which  is  equivalent  to  capacity)  and  channel  cutoff  rate 
as  our  objective  functions. 

1.3.  Partial-Band  Interference 

In  Chapter  3,  we  do  the  detailed  analysis  for  a  particular  kind  of  signalling  and 
jamming,  i.e.  orthogonal  signalling  in  worst-case  partial-band  jamming.  Partial- 
band  jamming  is  a  simple  and  effective  jamming  strategy  often  used  against  a 
frequency-hopped  spread-spectrum  system.  It  may  also  be  visualized  as  a  worst- 
case  model  of  partial-band  interference,  i.e.  situations  where  the  frequency-hopped 
spread  spectrum  system  is  subjected  to  interference  in  some  fraction  of  the  total 
spread  bandwidth  the  transmitter  is  using.  For  example,  in  a  spread-spectrum 
multiple  -access  communication  system  with  different  users  using  different  hop¬ 
ping  patterns  such  partial-band  interference  occurs  when  two  users  hop  to  the 
same  frequency-slot  at  the  same  time.  Another  instance  is  when  there  is  some  fad¬ 
ing  of  the  transmitted  signal  in  certain  frequency  bands.  The  underi\mg  idea  m 
partial-band  jamming  is  this:  by  concentrating  more  jamming  power  ..n  a  fraction 
of  the  transmitted  bits  the  jammer  is  willing  to  allow  some  transmissions  ’■>  o< .  ur 
unimpeded  but  causes  high  error  probabilities  for  the  fraction  of  bits  jammed.  \s 


the  uncoded  bit  error  probability  varies  dramatically  with  small  changes  in  the  bit 
energy  to  noise  jammer  ratio  ( Eb/Nj )  ( Nj  is  the  one-sided  power  spectral  den¬ 
sity  of  the  jammer’s  noisewhen  spread  uniformly  across  the  whole  bandwidth)  the 
jammer  can,  by  an  appropriate  choice  of  this  fraction,  cause  a  significant  increase 
in  the  average  bit  error  probability.  This  is  the  reason  for  the  well  known  degra¬ 
dation  of  uncoded  systems  in  worst-case  partial-band  (or  pulsed-time)  jamming 
[Hous  75],  [Simon  85].  The  worst-case  jammer  chooses  a  different  fraction  for  each 
Eb/Nj  and  can  thereby  convert  the  negative  exponential  dependence  of  the  bit 
error  probability  with  Eb/Nj  to  inverse  linear  dependence.  Thus,  for  instance, 
in  frequency- hopped  A/-ary  frequency  shift  keying  (M FS K),  this  causes  the  dra¬ 
matic  degradation  of  greater  than  14  dB  at  a  bit  error  probability  of  about  10~3 
and  greater  than  30  dB  at  a  bit  error  probability  of  about  10~5.  For  large  Eb/Nj 
typically  the  worst  case  fraction  jammed  is  small  and  thus  while  the  transmissions 
do  not  get  jammed  most  of  the  time  those  that  do  are  very  likely  to  cause  errors. 
This  suggests  that  some  form  of  coding  redundancy  that  causes  data  decisions  to 
depend  on  multiple  symbol  transmissions  can  reduce  the  effectiveness  of  partial- 
band  jamming.  Typically,  besides  the  usual  coding  gain  we  have  in  these  situations 
the  addditional  gain  from  the  neutralization  of  the  partial-band  jammer,  i.e.  the 
worst-case  fraction  with  coding  is  larger.  It  is  known  from  previous  work  [Star 
82]  that,  by  using  codes  with  rates  less  than  a  constant  depending  on  the  form 
of  modulation  and  demodulation  used,  partial-band  jamming  can  be  effectively 
neutralized. 

Orthogonal  signals  perform  well  asymptotically  in  AWGN  although  they  have 
iow  rate.  Our  analysis  in  Chapter  3  shows  that  orthogonal  signals  perform  poorly  in 
partial-band  jamming  even  asymptotically  .  We  then  investigate  the  performance 
of  orthogonal  signalling  with  diversity.  We  do  the  analysis  for  diversity  using 
different  diversity  combining  schemes:  majority  logic  combining,  linear  combining 


and  clipped  linear  combining.  All  the  analysis  is  asymptotic  and  again  we  are 
interested  in  the  performance  in  the  presence  of  the  worst-case  jammer.  Of  the 
above  schemes  only  clipped  linear  combining  performs  well  asymptotically. 
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CHAPTER  II 


WORST-CASE  ANALYSIS  OF  UNKNOWN 
INTERFERENCE 


2.1  Introduction 


We  consider  a  modulator  that  transmits  one  out  of  M  signals.  This  trans¬ 
mitted  signal  is  denoted  by  the  random  variable  X.  The  received  signal  which 
has  been  corrupted  by  the  jammer  in  some  fashion  is  demodulated  and  quantized 
into  one  of  L  values.  In  order  to  disallow  the  jammer  from  using  knowledge  of 
the  quantizer  in  designing  his  worst-case  strategy,  we  allow  randomization  of  the 
quantizer  over  some  given  set  of  quantizers.  Clearly  such  randomization  increases 
the  size  of  the  communicator’s  strategy  set.  Thus,  we  view  this  situation  as  a 
game  with  two  players;  the  jammer  and  the  communicator.  The  jammer  selects 
the  noise  in  the  channel  and  the  communicator  chooses  the  encoder,  the  decoder 
and  the  quantizer.  The  strategy  set  for  the  jammer  is  the  set  of  all  distributions  on 
the  power  of  the  jamming  noise  subject  to  the  given  constraints  on  the  peak  and 
average  power.  The  strategy  set  for  the  communicator  is  the  set  of  all  distributions 
on  the  input  alphabet  and  on  the  set  of  quantizers. 

For  this  general  set  up  we  show  that  the  worst  case  jamming  strategy  from  the 
communicator's  perspective  is  to  pulse  between  a  finite  number  of  power  levels. 


We  also  consider  the  case  of  random  decoding  strategies  where  the  demodulator 
output  is  quantized  into  a  finite  number  of  outputs  by  a  randomized  quantizer,  i.e. 
the  quantization  thresholds  are  random. 

For  this  case  we  show  that  the  optimal  randomized  quantizer  can  perform  bet¬ 
ter  than  the  nonrandomized  quantizer  and  that  from  the  jammer’s  point  of  view 
the  worst-case  distribution  of  the  thresholds  is  also  concentrated  on  a  finite  number 
of  points.  Our  basic  model  can  be  easily  seen  to  fit  a  frequency-hop  communication 
system  in  which  the  modulation  utilizes  an  M-ary  signal  set,  which  in  many  cases 
are  orthogonal  signals.  The  spread-spectrum  bandwith  is  divided  into  a  large  num¬ 
ber  of  frequency  slots.  Each  possible  modulated  signal  is  hopped  from  frequency 
slot  to  frequency  slot  using  a  pseudo-random  hopping  pattern.  During  each  hop, 
one  of  the  M  signals  is  transmitted.  There  are  two  important  special  cases.  First, 
all  modulated  signals  use  the  same  hopping  pattern  and  second,  each  signal  has 
its  own  hopping  pattern.  The  demodulator  is  a  coherent  or  noncoherent  matched 
filter  the  output  of  which  is  then  quantized  to  a  finite  number  of  values. 

The  remainder  of  the  chapter  is  organized  as  follows.  In  Section  2  we  define 
the  models  we  will  be  considering  and  give  examples  for  which  ou.  models  apply. 
In  Sections  3  and  4  we  derive  the  above  stated  results  considering  the  worst  case 
jamming  strategy  and  the  optimal  quantizer  strategy  using  mutual  information 
as  our  objective  function  for  the  cases  with  the  decoder  uninformed  about  the 
quantizer  and  informed  about  the  quantizer  respectively.  In  Section  5  we  do  the 
analysis  for  the  case  when  the  quantizer  is  fixed,  i.e.  no  randomization  of  the 
quantizer.  We  then  use  the  channel  cutoff  rate  as  our  objective  function  and 
derive  similar  results  in  Section  6.  Finally,  in  Section  7,  we  state  our  conclusions 
and  extend  our  results. 


2.2  Channel  Models 


In  this  section  we  describe  the  models  we  use  in  the  subsequent  analysis.  In  all 
cases  we  consider  a  modulator  that  transmits  one  out  of  M  signals  in  D  dimen¬ 
sions.  This  transmitted  signal  is  denoted  by  the  random  variable  X.  The  received 
signal  which  has  been  corrupted  by  the  jammer  in  some  fashion  is  demodulated 
and  quantized  into  one  of  L  values.  The  received  signal  is  denoted  by  the  random 
variable  Y.  (Y  can  also  be  a  random  vector  without  changing  any  of  the  following 
analysis). 

The  general  philosophy  that  we  will  use  is  that  of  game  theory  with  the  players 
being  the  jammer  and  the  communicator.  The  jamming  strategies  are  distributions 
dF  on  D  random  variables,  Zx,  Z2, ...,  Zd-  These  random  variables  represent 
the  power  of  the  jammer  in  each  of  the  signed  dimensions  and  are  modelled  as 
modulating  generic  noise  variables  present  in  the  channel.  The  jammer,  however, 
has  an  average  power  constraint  and  a  peak  power  constraint.  More  generally  the 
jammer  is  constrained  by 

J f{zi,Z2,-,ZD)dF{zx,z2,...,ZD)  <  Kj  (2.1) 

and 

0  <  Zj  <  bj,  j  =  1 , . . . , £>  (2.2) 

where  is  the  peak  power  constraint  and  f(zi,...,zo )  is  some  continuous 
functional  of  (zx, ...,  zp).  For  average  power  constrained  channels  with  no  peak 
constraint  we  let  bj  become  very  large. 

The  output  of  the  demodulator  is  quantized  into  one  of  L  values,  say  0, 1,...,L  — 
1.  The  output  of  the  quantizer,  Y ,  is  also  the  output  of  the  channel  for  coding. 
The  strategies  for  the  communicator  are  to  choose  a  distribution,  dG{0),  on  the 
quantization  thresholds  and  a  distribution,  dP(x),  on  the  input  alphabet.  We  will 
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let  0  parametrize  the  quantizers  and  assume  0  is  some  compact  subset  of  R  (0 
will  be  used  to  denote  both  the  random  variable  as  well  as  the  set  of  quantiz¬ 
ers).  We  assume  0  and  X  are  independent.  For  each  (zi, ...,  zq)  and  0  there 
is  a  probability  distribution  on  the  output  of  the  channel  given  the  input  of  the 
channel: 


Prob{Y  =  y\X  =  x,0  -  0,  Zj  =  zuZ2  =  ziy...,ZD  =  zD]  -  p(jf|x,0, z,, z2, zD). 

(2.3) 

The  above  model  describes  the  input  output  relation  of  the  channel  for  a  particular 
symbol.  In  addition  we  model  the  channel  as  being  memoryless. 

In  all  of  our  analysis  we  assume  that  the  jammer  and  the  decoder/quantizer 
have  complete  information  about  the  set  of  strategies  possible  for  each  other  so 
that  no  secret  information  is  considered.  As  mentioned  previously,  the  performance 
measure  we  consider  is  the  largest  rate  such  that  reliable  communication  (in  the 
sense  of  arbitrarily  small  error  probability)  is  possible.  The  type  of  channels  we 
are  considering  are  known  as  compound  channels  with  the  set  of  channels  (out  of 
which  one  is  chosen)  indexed  by  dF(z).  We  consider  the  strategies  (distributions) 
by  the  jammer  to  be  constant  for  a  whole  codeword  as  opposed  to  (possibly) 
changing  after  each  symbol  of  a  codeword  which  would  correspond  to  an  arbitrarily 
varying  channel.  For  compound  channels  the  capacity  with  finite  input  and  output 
is  well  known  to  be  the  maximum  of  the  minimum  mutual  information.  The 
minimum  is  over  all  possible  transition  probabilities  and  the  maximum  is  over  all 
probability  distributions  on  the  input  to  the  channel.  Thus,  using  the  maximum  of 
the  minimum  mutual  information  as  the  performance  measure  corresponds  to  the 
largest  rate  such  that  reliable  communication  is  possible  no  matter  what  strategy 
the  jammer  employs. 

We  now  introduce  some  notation.  Let: 


A  =  {0, 1, M  —  1}  be  the  input  alphabet, 

B  =  {0, 1, L  —  1}  be  the  output  alphabet, 

0  be  the  quantizer  parameter  space  (some  compact  subset  of  R) 

Z  be  (Zi, Zd),  (0  <  Z{  <  b{) 

p(y\x,0,z),  the  transition  probability  from  x  to  y  given  0,  z,  and 
Pyx(0,  z)  the  corresponding  stochastic  matrix,  Pyx(0,  z)  —  \p(y\x,9,  z)]. 
We  assume  that 

(i)  p(y\x,6,z)  is  continuous  in  z  for  all  ^,i  and 

(ii)  p(y|x,  0,  z)  is  continuous  in  6  for  all  i,  z. 

Let  S  denote  the  set  of  all  probability  distributions  on  the  Borel  sets  of  K  =  = 

{zx, . . . ,  zd)  :  0  <  Zi  <  6,},  and 

I{G,P-F)  =  l(JK  P„(6,z)dG(0)dF(z)) 

■ '  (L 

=  /  (fa  Pt,(»)4G(e )) 

-  I(~P„(G,F))  (2.4) 

where  I  (PVX(G,  F))  is  the  mutual  information  whenever  .V  and  Y  are  related 
bv  the  stochastic  matrix  ~Pyx. 

We  now  illustrate  the  applicability  of  the  above  model  with  an  example.  Con¬ 
sider  a  frequency- hopped,  binary  (\f  =  2)  phase  shift  keyed  signal  set  with  data 
signal  d(t): 

30 

d(t)  =  £  XnpT(t-nT) 

n*  —  <x> 

where  A'n  G  {-1,1}  represents  the  information  bit  at  time  interval  nT  and 

=  1  0  <  t  <  T 

Pt(  t ) 

=  0  elsewhere  . 

Here  pT{  t. )  is  the  unit  pulse  on  [0.7]  and  7'  is  the  duration  of  a  data  bit.  The  signal 


after  modulation  can  be  represented  by 

s(t)  =  V2Pd(t)  cos  2 vfct  . 

where  P  is  the  power  of  the  transmitted  signal.  In  this  example,  D,  the  dimen¬ 
sionality  of  the  signal  set,  is  1  (we  axe  using  antipodal  signalling).  After  frequency 
hopping  we  have  the  signal  s'(t)  where 

s'(t)  =  V2P  d(t )  cos  (2 n(fc  +  fh(t))t) 


where 


fh{t)  =  2  f*  PrH{t  -  nTh), 


Prh(t)  is  the  unit  pulse  on  [0,7\],  {/n}^L_oo  is  an  i.i.d.  uniform  sequence  over 
the  set  {/i, ...,  j7},  and  T \  is  the  hop  duration.  For  simplicity  let  us  assume 
7\  =  T.  The  jamming  signal  added  in  the  channel  may  be  described  as 

j'(t)  =  Y1  Wi(0i«(0  cos  2* f,t 


where 


WW)  =  £  WtjPT(t-lT ) 


and  is  the  jamming  power  in  the  ith  frequency  slot  during  IT  <  t<(l+\)T 
and  where  j,(t)  is  some  unit  normalized  noise  random  process  in  the  ith  frequency 
slot.  Also,  for  the  abstract  model,  Z\  would  be  the  random  variable  VF,  /  in  the 
frequency  slot  chosen  for  transmission  of  the  signal  during  the  Ith  time  interval. 
Given  any  quantizer  9.  the  channel  transition  matrix  p{y\x,9,z)  can  be  calculated 
by  considering  the  effect  of  such  a  quantizer  on  the  random  variables  at  the  output 
of  the  matched  filter.  The  received  frequency-dehopped  signal  is 


r(£)  =  s(t)  +  £  (t)S(fK(t).fl). 


The  nature  of  j,  (<)  determines  the  type  of  jamming  that  is  involved.  Thus 
if  j,  ( t )  were  Gaussian  noise,  we  would  be  dealing  with  Gaussian  jamming.  If, 
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on  the  other  hand,  j,(t)  is  chosen  to  be  a,(t)  cos  (2xfct  +  h(i))  ,  then  we  have  a 
model  for  tone  jamming  for  which 


<b(*)  =  E  v*  M*  "  nT ) 

«=— oo 


where  Vn  €  {— 1,+1}  and 


hi1)  =  E  <hPT{t~  IT) 

/=  — oo 

where  4>i  s  are  i.i.d.  random  variables  uniform  on  [0,2tc], 

Consider  the  Gaussian  jamming  case  with  Xn  i.i.d.  and  equally  likely  to  be 
—  1  or  1  and  with  each  transmission  equally  likely  to  be  in  any  of  the  q  frequency 
slots.  In  t<  rms  of  our  model,  since  D  —  1  for  the  BPSK  signal  set,  Zx  is  a  random 
variable  with  the  same  distribution  as  Wij,  f{zx)  =  (zx),  Kj  =  1  (say)  and  6,  ’s 
can  be  any  arbitrary  constants  greater  than  or  equal  to  1.  For  tone  jamming  the 
only  difference  in  the  model  is  that  N  is  the  random  variable  cos<t>  where  <j>  is 
uniformly  distributed  over  [0,  2jt]. 

The  normalized  output  of  the  demodulator  is  then  of  the  form 

U  =  X  +  NZ 

with  X  6  +1,  —  1,  N  a  generic  random  variable  and  Z  the  jamming  strategy  chosen 

by  the  jammer.  We  can  see  that  the  interference  during  the  Ith  time  interval  at 

the  output  of  the  matched  filter  is  a  random  variable  of  the  form  I  =  NW ^  with 

probability  -  for  i  =  1, . . . ,  q,  where  N  is  a  standard  normal  random  variable.  If 
<7 

the  output  of  the  demodulator  is  quantized  by  a  three  level  quantizer,  fo  imple, 
then 


y  =  n(u) 
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where  53(14)  is  given  by 

?3(u)  =  1  it  >  0 

=  ?  -9  <u<9 

=  — 1  u  <  —0, 

0  <  0  <  1.  Then  p(y|x,  0,z)  is  given  by  (for  N  Gaussian) 

J  _  Q 

p(y\x,9,z)  =  1  -Q( - )  y  =  x 

=  Q(±±Vq(^)  y  =? 

=  Q{—^—)  y^x,y^-- 

Z 

Returning  to  our  abstract  model,  the  strategies  for  the  jammer  are  all  distri¬ 
butions  dF  on  Zi,Z2,...,Zd  satisfying  the  given  constraints.  The  strategies  for 
the  communicator  are  all  distributions  dP  on  X  and  all  distributions  dG  on 
0.  The  performance  measures  we  are  interested  in  is  the  largest  rate  such  that 
nearly  error-free  communication  can  be  achieved,  i.e.  channel  capacity,  and  R0 , 
the  channel  cutoff  rate.  Rq  is  a  very  useful  channel  parameter  especially  for  the 
use  of  convolutional  codes.  Many  researchers  believe  Ro  to  be  a  practical  limit 
to  the  set  of  rates  for  which  reliable  communication  is  possible. 

We  consider  two  different  structures  for  the  knowledge  of  information  by  the 
communicator. 

I.  The  decoder  is  unaware  of  the  actual  quantizer  chosen  but  only  knows  the 
distribution  dG(9)  on  the  set  of  quantizers.  The  jammer  knows  only  the  set 
of  quantizers  but  not  the  distribution  dG(9 )  chosen  by  the  communicator. 
He  is  also  aware  that  the  decoder  does  not  know  the  actual  quantizer  chosen. 

II.  The  decoder  knows  the  actual  quantizer  chosen.  Again  the  jammer  knows 
only  the  the  set  of  quantizers.  He  also  knows  that  the  decoder  is  aware  of 
the  actual  quantizer  chosen. 


V'V  V 


Case  I  is  seen  to  apply  to  situations  where,  for  reasons  of  implementation  perhaps, 
the  decoding  is  fixed  and  not  altered  with  the  specific  quantizer  chosen.  It  may  also 
be  viewed  as  worst-case  in  the  sense  that  the  decoder’s  knowledge  of  the  specific 
quantizer  and  the  utilization  of  such  knowledge  can  only  improve  the  communi¬ 
cator’s  performance.  When  there  is  no  randomization  of  the  quantizer,  i.e.  the 
quantizer  is  fixed,  Cases  I  and  II  are  the  same  and  our  results  for  both  cases  apply 
to  that  situation.  Also  several  special  jamming  strategies  axe  of  interest  because 
of  correspondence  with  physical  problems.  We  will  classify  the  cases  as  follows. 

A.  Arbitrary  joint  distribution  on  Z\,  Z2, ...,  Zd- 

B.  Z\  —  Zi  =  ...  =  Z u  ~  Z. 

C.  One  dimensional  jamming,  i.e.,  at  most  one  of  the  random  variables  Zx  ^  0. 

D.  Independent  jamming,  i.e.,  Z\,  Zi, ...,  Zd  are  independent. 

Case  B  corresponds  to  the  physical  situation  where  the  jammer  is  not  able  to 
place  different  amounts  of  power  in  different  dimensions  of  the  signal  space.  Case 
C  corresponds  to  the  case  where  only  one  of  the  dimensions  can  be  jammed  at  once. 
Case  D  corresponds  to  a  frequency-hop  communication  system  with  independent 
hopping  for  the  different  symbols.  The  standard  game  theoretic  description  is 
given  below. 

Communicator’s  Perspective 

The  communicator  is  interested  in  the  maximum  rate  at  which  information  can  be 
reliably  transmitted  no  matter  what  strategy  the  jammer  employs.  The  communi¬ 
cator  designs  his  system  assuming  the  jammer  will  somehow  find  out  the  strategy 
he  is  using  and  then  choose  the  worst  possible  distribution  on  the  power  levels.  In 
Case  I  the  largest  rate  for  which  information  can  reliably  be  transmitted  is 
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max  nun  I(G,  P;  F) 

G,P  F 

where  I(G,P\F)  =  I{X\Y)  when  ( dG,dP )  is  chosen  by  the  communicator  and 
dF  is  chosen  by  the  jammer  and  I(X;Y)  =  p(x,  y)  log(p(y|x)/p(y )).  That 
this  is  the  maximum  rate  of  reliable  transmission  is  well  known  since  what  we 
axe  dealing  with  is  a  compound  channel  with  a  finite  input  alphabet  and  a  finite 
output  alphabet  and  a  channel  set  indexed  by  the  distributions  dF{z).  [Csiz  81, 
pgs.  172-173], 

Jammer’s  Perspective 

The  jammer  is  interested  in  the  minimum  rate  such  that  information  can  not  be 
reliably  transmitted  at  any  higher  rate  no  matter  what  strategy  the  communi¬ 
cator  employs.  The  jammer  designs  his  system  assuming  the  communicator  will 
somehow  find  out  the  strategy  he  is  using  and  then  design  the  optimal  communi¬ 
cation  system.  In  Case  I  the  smallest  rate  that  the  jammer  can  guarantee  reliable 
communication  can  not  be  above  is 


$ 

1 


min  max  7(G.  P\F). 

dF  iG.dP 

That  this  is  the  smallest  rate  the  jammer  can  guarantee  is  obvious  since  for  each 
F  the  rate  above  which  reliable  communication  is  impossible  is  max  I(G.  P;  F). 

dG.dP 

In  Case  II  the  appropriate  mutual  information  can  be  written  as  an  expectation 
of  the  mutual  information  for  a  fixed  9: 


Wf: 

Y’!:- 


I(G,P;F)  =  Ea(I(9,P-,F)) 

where  Eg  refers  to  taking  expectations  w.r.t.  dG  and  1(0,  P:  F)  =  I(X;  V'|0)  where 
I(X;Y\9)  =  T.x,vP{x)p(y\x,9)\og(p(y\x,9,)/p{y\0))  since  X  and  0  are  indepen¬ 
dent. 


We  are  now  ready  to  state  the  results.  In  brief  our  results  show  that  when 
the  decoder  is  informed  of  the  quantization  rule  then  (under  a  compatibility  as¬ 
sumption),  there  is  a  saddlepoint  in  cases  A  and  B,  i.e.  the  jammer’s  rate  and  the 
communicator's  rate  are  equal  (Theorem  5).  However,  when  the  decoder  is  not 
informed  of  the  quantization  rule  then  the  jammer’s  rate  and  the  communicator’s 
rate  may  differ.  However  the  optimal  distributions,  F  from  the  communicator’s 
point  of  view  and  the  G  from  the  jammer’s  point  of  view  are  concentrated  on  a  fi¬ 
nite  number  of  points  tin  all  the  cases  A.  B.  C  and  D)  (Theorem  1).  This  converts  a 
functional  optimization  problem  into  a  finite-dimensional  non-linear  programming 

problem 

2.3  C  ase  A I:  Decoder  Uninformed 

The  communicator  has  to  determine  the  distributions  (dG(Q).dPix))  that 
maximize  the  amount  of  information.  KG.P.F)  .  transmitted.  The  jammer  has 
to  find  the  noise  distribution  dFiz \  to  minimize  the  information  received  by  the 
decoder  Thus,  the  quantizer  s  goal  is  to  achieve 

max  nun  I(G.P.F) 

dG(8).dP[z)  dFfz) 

whereas  the  jammer  wants  to  achieve 

min  max  I{G.P:F). 

dF> z )  dG(9t.dP(z) 

In  this  section  we  show  that  for  any  choice  of  strategy  of  either  player  there  is 
a  simple  characterization  of  the  optimal  reaction  strategy  of  his  opponent. 
Theorem  1:  a)  The  jammer  can  achieve  the  minimum  in  max  min  I(G,P.. 

dG(9),  dP(z)dF(z) 

with  a  distribution  concentrated  at  at  most  M(L  —  1 )  +  2  points. 

b)  The  communicator  can  achieve  the  maximum  in  min  max  I[G,P;F) 

dF(i)dG(8),  dP(z) 

with  a  distribution  concentrated  at  at  most  M{L  —  1)  +  1  points. 


Discussion:  Theorem  1(a)  says  that  the  communicator  in  trying  to  achieve 
max  min I(G,P-,F)  has  to  consider  only  reaction  strategies  of  the  jammer 

dC(0),  dP(x)dF(z) 

that  have  a  finite  number  of  points  of  support,  i.e.  for  each  (dG(9),  dP(x))  chosen 
by  the  communicator  the  worst-case  jammer  distribution  may  be  assumed  to  be 
concentrated  at  a  finite  number  of  points  and  this  number  is  bounded  uniformly 
(in  (dG(9),  dP(x)))  by  M(L  —  1)4-2.  It  follows  that  for  a  fixed  quantizer  (i.e.  no 
randomization  of  the  quantizer)  the  worst-case  jammer  is  one  who  chooses  such  a 
finite-dimensional  distribution.  Similarly  Theorem  1(b)  says  that  the  jammer  may, 
from  his  perspective  of  trying  to  achieve  min  max  I(G,  P ;  F )  ,  consider 

dF(z)  dG(9)t  dP(x) 

only  finite  dimensional  reaction  strategies  on  the  communicator’s  part. 

To  prove  these  results  we  use  the  following  facts:  (1)  the  convexity  and  con¬ 
cavity  properties  of  the  mutual  information  function  (it  is  convex  in  the  channel 
transition  matrix  and  concave  in  the  input  distribution),  (2)  the  equivalence  of 
weak  convergence  with  Levy  convergence  in  our  situation  (see  Appendix  B)  which 
we  use  to  show  the  continuity  of  our  objective  function  in  the  strategies  as  well  as 
compactness  of  our  strategy  sets  (see  Appendix  C)  (this  allows  us  to  conclude  that 
there  is  a  worst  case  jamming  strategy  and  a  best  case  communicator  strategy)  and 
(3)  Dubins’  Theorem  in  order  to  demonstrate  that  the  optimal  reaction  strategies 
are  described  by  distributions  concentrated  on  a  finite  number  of  points.  Dubins’ 
Theorem  allows  the  extreme  points  of  certain  convex  sets  to  be  written  as  finite 
linear  combinations  of  extreme  points  of  larger  convex  sets. 

Proof  of  Theorem  1: 

We  prove  part  (a)  in  detail.  The  modifications  required  to  obtain  part  (b)  are 
obvious.  We  start  by  first  proving  two  intermediate  results.  Lemmas  1  and  2. 
Lemma  1:  I(G,P\F )  is  a  Levy-continuous  functional  of  dF(z)  for  any  fixed 
(dG(9),dP(x)). 


Proof  of  Lemma  1: 


First  we  note  that  for  every  (dG(9),dP(x)),  I(Pyx)  is  a  convex  function  of 
Pyx  [Csiz  81,  pg.  50]  ,  i.e., 

I(aPix  +  (l-a)P’J  <  aI{Plx)  +  (1  -a)/(Py2x)  0  <  a  <  1 

and 

p{y\x,z)  =  l  p(y\x,  9,  z)dG(9) 

J  0 

is  a  continuous  function  of  z  (since  p(y\x,  9,  z)  is  continuous  in  z  and  p(y\x.9,z)  < 
1,  this  follows  from  the  Dominated  Convergence  Theorem).  Also 

p(y\x)  =[  I  P(y\x,9,z)dG(9)dF(z) 

JK  J© 

=  P(y\x,z)  dF(z). 

Hence  p(y\x)  is  a  Levy-continuous  functional  of  dF(z )  and  therefore  Pyx  is  a 
Levy-continuous  functional  of  dF(z). 

Now  7(G,P;P)  is  a  convex  function  of  Pyx  and  hence  it  is  continuous  in 
the  interior  of  the  finite-dimensional  set  W  of  all  stochastic  matrices.  (Thus. 
I(G,  P;  P)  is  continuous  at  any  point  PV*  such  that  at  least  one  row  of  Pyx  is 
not  a  one  point  distribution,  i.e.  Pyx  is  not  deterministic).  Hence,  /(G,  P;P)  is 
a  Levy-continuous  function  of  dF(z )  for  any  fixed  (dG(9),  dP(x)).  □ 

Let  S  =  set  of  all  probability  distributions  on  the  Borel  subsets  of  K,  and 

S1  =  { dF(z )  G  S  :  j  f(z)dF(z)  =  70}  (2.5) 

be  a  hyperplane  in  S. 

Lemma  2:  7(G,  P;P)  achieves  its  maximum  (minimum)  in  S1. 

Proof  of  Lemma  2: 

We  note  that  S  is  compact  in  the  Levy  topology  (Appendix  C). 

Also  S1  is  a  hyperplane  in  S  which  is  closed  (since  dF(z)  — *  fK  f(z)  dF(z) 
is  Levy-continuous)  in  the  Levy  topology. 


Hence  S1  being  a  closed  subset  of  a  compact  set  is  itself  ( Levy  (compact. 
Thus  Lemma  1  asserts  that  for  fixed  (dG(9).  dP(x)),  I(G ,  P\  F)  is  a  Levy- 
continuous  functional  on  the  compact  set  S1.  Hence  it  achieves  its  minimum  I  max¬ 
imum)  at  some  point  dFm(z)  €  S1.  Z 

The  above  lemmas  are  used  to  complete  the  proof  of  Theorem  1. 

From  Lemma  2  we  know  that  I{G,P\F )  achieves  its  minimum  in  S'.  Let 
dF'(  ri  be  a  distribution  which  achieves  min  I(G,  P.  F).  Denote  the  correspond- 

dF  (x) 

ing  Pyz  as  P;x  =  [p“( y | x )]  i.e. 

Py:  =  JK  p(y\x,9,z)  dG(9)  dF’{:)-  (2.6) 

Now  consider  the  set 

A  =  {dF(z)£  S':  [  [  p(y\x,z.9)  dG(0)  dFiz) 

Jk  J& 

=  x  €  A.y  €  fi1}  (2.7) 

where  Bx  =  {0, 1 . . . .  L— 2}.  The  set  A  is  the  intersection  of  S  with  M{L  —  1)-M 
hyperplanes  viz.  S1  and  the  M(L  -  1)  hyperplanes 

hyi  =  {dF{z )  €  S1  :  If  p(y\x,  z.9)  dG(9)  dF[z)  =  p*(y|x) }.  (2.8) 

A  0 

Furthermore: 

S  is  convex. 

S  is  linearly  bounded  (S  being  compact  in  a  metric  space  is  bounded  and  hence 
its  intersection  with  any  line  is  bounded). 

S  being  a  compact  subset  ot  a  metric  space  is  closed  and  any  line  /  in  the  metric 
space  is  closed.  Thus  S  is  also  linearly  closed. 

Hence  we  have  that  S  is  a  convex,  linearly  closed  and  linearly  bounded  set. 
By  Dub  ins  Theorem  Dubi  62]  we  can  conclude  that  since  A  is  the  intersection 


of  S  wit  1  X1[L  —  1)4-1  hyperplanes,  every  extreme  point  of  A  is  a  convex 
combination  of  Xf(L  —  1)  +  2  or  fewer  extreme  points  of  S. 

From  our  construction  of  A  we  know  that  I(G.  P :  F)  is  constant  on  A.  Hence 
for  fixed  (dG(9),  dP{x))  ,  I(G,P\F)  assumes  its  minimum  value  at  an  extreme 
point  of  A  also. 

Hence,  /(G,  P\  F)  assumes  its  minimum  value  at  some  point  dF(z)  which  is 
a  convex  combination  of  Xf(L  —  1)  +  2  or  fewer  extreme  points  of  S. 

Since  the  extreme  points  of  S  are  the  one- point  distributions,  we  can  finallv 
assert  that  lor  each  {dG( 9).  dP(x))  the  jammer  can  achieve  the  minimum  in 

max  min  I{G,  P:  F) 

dG( i ) ,  dP(z)  dF(z) 

with  a  distribution  concentrated  at  M(L  -  1)  +  2  points.  This  concludes  the 
proof  of  la). 

For  channels  which  are  symmetric  for  each  9  and  r,  i.e.  p( y| i, ,  0)  is  some 

permutation  of  p(y|x,,2.0),  we  see  that  the  set  A  is  actually  the  intersection  of  S 
with  (L  -  1)  +  1  hyperplanes  only,  and  hence  part  (a)  of  the  theorem  holds  with 
(L  —  1)  +  2  =  L  +  1  instead  of  \f(L  —  1)  4-  2.  For  ,V/-ary  symmetric  channels, 
i.e.  channels  with  XI  inputs  and  XI  outputs  and  such  that  for  each  9  and  r. 

p(y,|x,,z,0)  =  1  —  e  and  p{yt\xv  z,  9)  =  — - i  ^  j,  the  bound  on  the  number 

of  points  of  support  reduces  to  3. 

For  ( b )  we  note  that  the  jammer  wants  to  achieve 

min  max  /( G.P.F ). 

iF(z)  dG(6),dP(z) 

This  may  be  written  as 

min  max  C(G.F) 

dF(z)  dG(0) 

where  C(G,  F)  =  max  I(G,P:F). 

dP{z) 

We  note  that  similarly  to  Lemma  1  for  any  fixed  dF{z),  C{G.  F)  is  a  con¬ 
tinuous  functional  of  dG(9).  (Simply  note  that  G(G,  F)  being  the  maximum  of 
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functions  convex  in  Py*  is  also  convex  in  Pyx  and  proceed  as  before).  Using  our 
hypothesis  that  p(y\x,9,z )  is  continuous  in  6  we  can  show  that 

min  max  C(G,  F) 

dF(z)  dG(6)  V  ' 

can  for  any  dF(z)  be  achieved  by  the  decoder /quantizer  by  a  distribution  dG(9 ) 
that  is  concentrated  at  at  most  M(L  —  1)  +  1  points. 

Again  for  symmetric  channels  we  note  that  part(b)  of  the  theorem  holds  with 
L  instead  of  M(L  —  1)  +  1.  For  M-ary  symmetric  channels  this  number  is  2.  The 
number  of  points  of  support  is  one  less  than  Case  A  as  we  have  not  imposed  any 
constraints  on  the  distributions  dG(9)  chosen  by  the  quantizer.  □ 

2.3.1  Necessary  and  Sufficient  Conditions 

We  now  characterize  the  finite-dimensional  distributions  of  Section  3.1  by  means 
of  necessary  and  sufficient  conditions.  We  first  briefly  introduce  the  necessary  defi¬ 
nitions  and  results  from  optimization  theory  and  then  specialize  them  to  our  cases. 

Let  Q  be  a  convex  set  and  let  /  be  a  function  from  Q  into  R.  For  some 


fixed  x0  if  for  all  x 


lim  /((l  ~  a)jQ  +  ax)  -  /(go)) 

a|0  ct 


exists  /  is  said  to  be  weakly  differentiable  at  x0  and  the  above  limit  is  denoted 
35  /z o  (x)»  the  weak  derivative  at  x0.  If  /  is  weakly  differentiable  in  Q  at  x0  for  all 
Xq  in  fi,  /  is  said  to  be  weakly  differentiable  in  0.  We  now  state  an  Optimization 
Theorem  that  follows  from  [Luen  69,  pg.  178). 

Optimization  Theorem:  Let  /  be  a  continuous,  weakly  differentiable,  convex- 
cap  (concave)  map  from  a  compact,  convex  set  to  R.  Let 


C  =  sup  /(x). 
r  €  n 


(2.10) 


Then 
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I 


1 


I 

$ 
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1.  C  =  max  f(x)  =  f{x0)  for  some  x0  €  f i. 

2.  A  necessary  and  sufficient  condition  for  f(x0)  =  C  is  /'0  (x)  <  0  for 
all  x  e  ft. 

Constrained  Optimization  Theorem:  [Luen  69,  pg.  217]  Let  ft  be  a  convex 
subset  of  a  linear  vector  space  and  /  and  g  convex-cap  functionals  on  ft  to 
R.  Assume  there  is  an  Xi  €  ft  such  that  <?(xi)  <  0  and  let 


C'  =  sup  /(x). 
x  6  ft 
g(x)  <  0 

If  C'  is  finite  then  there  exists  a  constant  A  >  0  such  that 


(2.11) 


v!| 

:g 

.  *  i 


C'  =  sup  [/(x)  -  A$(x)]. 

x  €  0 


(2.12) 


Furthermore  if  the  supremum  in  the  first  equation  is  achieved  by  x0  €  ft  and 
g(x o)  <  0,  then  this  supremum  is  achieved  by  xo  in  the  second  equation  and 
\g(x0)  =  0.  [Luen  69,  pg.  217]. 

Now  given  any  (dG(0),  dP{x))  and  the  power  constraint 

J  f{zuz2,...,zD)dF{  Z\i  Z2i  •")  zd)  5;  Kj 


we  define 

Uc(I<j,G)  =  sup  -  I{G,  P;  F) 
F  €  S 
hp  <  Kj 

where  hp  =  fK  f(z)  dF(z).  To  simplify  notation  we  define 


:  S  -►  R  by  D(F)  =  f  f(z)dF(z)  -  Kj. 

J  fC 


(2.13) 


(2.14) 


Using  the  Constrained  Optimization  Theorem  we  will  infer  in  Theorem  2  that 
there  exists  a  non- negative  constant 


A  =  \(G,Kj)  for  D(F)  <  0  such  that 


(2.15) 


L’c(G,Kj)  =  sup  [— 7(G,  P\  F)  -  A D(F)\. 

F  €  S 

We  now  formulate  necessary  and  sufficient  conditions  for  the  characterization  of 
the  optimal  distributions  of  Theorem  1  in  the  following  two  theorems. 

Theorem  2:  Uc{G,Kj)  is  achieved  by  a  distribution  F0  6  S  satisfying  D(F)  < 
0  and  a  necessary  and  sufficient  condition  for  Uc(G,Kj)  =  — /(G.  P;F0)  is  that 
for  some  constant  A  >  0 

[  [-*(*;  G,F0)  -  A f(z)]dF(z)  <  — /(G,  P\  F0)  -  XKj  (2.16) 

J  h 

for  all  F  €  S 

where  i (z;G,F0)  =  £x,y  F(x)  p(y\x,  z)  log 

Proof  of  Theorem  2: 

D  :  S  — *  R  is  clearly  linear,  bounded,  convex-cap,  continuous  and  weakly 
differentiable  in  S  with  D'Fl  (F2)  =  D(F2)  -  D(FX).  By  choosing  F,  as  a 
distribution  with  unit  mass  appropriately  we  can  infer  that  D{FX )  <  0.  Next  we 
show  that  7(G,  P\  F)  is  convex  in  F. 

I(G,  P\aFx  +  ( 1  —  a) F2)  =  I{Fyt  (G,qF,  +  (1  -  a)F2)) 

=  7  {JK  Je  P(y\x'e<:)  dG(0 )  (a^i  +  (1  -  a)dF2)) 

=  /(a  Pyr  (G;Fj)  +  (1  -a)~Pyx  (G;  F2)) 

=  +(1-0)7^) 

<  a/(p;x)  +  (1  -  a)  7(/^r) 

(by  the  convexity  of  /(.)  te.  r.  t.  Pyx) 

=  aI(G,P,Fx)  +  [l  —  a)  I(G,  P\  F2).  (2.17) 

Then,  since  Uc(G,I\j)  is  finite  we  can  infer  from  the  Constrained  Opti¬ 
mization  Theorem  that  there  exists  some  constant  A  >0  such  that  l’c  = 
sup  [ — /( G,  P\  F)  -  A D(F)\. 


fp(y\x,z)  dFy  (z) 

Y  p(x)  fp(y ix'z)  dfo(z) 
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Now  we  show  that  I[G,P\F)  is  weakly  differentiable  at  all  F  G  S. 

Let  L(a)  =  I(G,P\aFx  +  ( 1  —  aJFj).  Since  I(G,  P\  F)  is  convex  in  F  .  L(a) 

is  convex  in  a.  Therefore  — - — - -  is  non-decreasing  in  a  and  bounded  from 

a 

below  and  thus  lim  — - — exists.  Furthermore 

a  |  o  a 

Lemma  3:/^  KG,  P;  F2)  =  /  i(z;G,Fx)  dF2(z)  -  7(G,P;F,). 

Proof  of  Lemma  3: 

See  Appendix  D. 

We  now  have  that  —  /(G,  P\F)~  A  Z)(F)  is  convex-cap,  continuous  and  weakly 
differentiable  in  F.  Thus,  by  the  Optimization  Theorem  there  is  a  distribution 
function  F0  €  S  such  that  Uc(G,Kj)  =  -I(G,P;F0)  -  A D(F0).  The 
necessarv  and  sufficient  condition  becomes 


V 

V 
*•  • 

it 

or 

-  I'Fo  (G,  P\F)  -  A  D'Fa  (F)  <  0  for  all  F  £  S 

(2.18) 

11 

/, 

([-i(z-,G,F0)  -  A f(z)]dF(z)  <  -I(G,P;F0)-\hFo. 

(2.19) 

% 

If  hFo  < 

I\j  the  power  constraint  is  trivial  and  the  constant  A 

is  zero,  i.e. 

r 

D(Ffi)  < 

0  but  A  D  ( F0 )  =  0.  Thus  the  necessary  and  sufficient  condition  is 

p 

established 

□ 

From  Theorem  1  we  know  that  it  is  possible  to  find  F0  from  the  set  of 
distributions  with  a  finite  number  of  points  of  support.  Finding  such  an  F0 
entails  determining  the  set  of  points  of  increase  as  well  as  the  amounts  of  increase 
of  F0  at  those  points.  Let  E0  denote  the  set  of  points  of  increase  of  F0.  We 
now  show 

Theorem  3:  Let  F0  be  a  probability  distribution  satisfying  the  power  con¬ 

straint.  Then  F()  achieves  U.(G.Kj)  iff  for  some  A>0 
Cl)  -Hz;G.F0)  <  -I(G.P-.Fo)  +  A (/(*)  -  I<j) 

for  all  r  €  A'. 


V 


>#• 

\Vjj 

'  IV 


C2)  -i(^r;  G\  F0)  =  -I(G,  P\  F0)  +  A(/(z)  -  Ay) 


for  all  z  €  £o- 


Proof  of  Theorem  3: 


The  sufficiency  is  clear  because  if  both  conditions  Cl  and  C2  the  conditions  of 
Theorem  2  hold.  We  show  the  necessity. 

Assume  that  F0  is  “optimal”  but  Cl  is  not  true.  Then  there  must  exist  some 
Zi  €  K  such  that  -i(zi;G,F0)  >  ~I{G,P\F0)  +  A(/(*i)  -  Kj).  Let  F}(z) 
be  a  probability  distribution  with  a  unit  increase  at  such  a  point  z\  6  K.  Then 


/  H(z;G,F0)  -  A f(z)\dFl(z)  >  -I(G,P;F0)  -  \I<j 

JK 


which  contradicts  Theorem  2.  Hence  Cl  must  be  true. 


(2.20) 


Now  assume  that  F0  is  “optimal”  but  C2  is  not  true.  Then  since  Cl  is  true 
- i{z;G,F0 )  <  —I(G,P\F0)  +  A (f(z)  —  Kj)  for  all  z  in  E'  where  E'  is 


some  subset  of  E0  with  positive  measure,  i.e. 

/  dFo(z)  =  c  >  0. 

JE' 

Since  JEo_E,  dF0(z)  =  1  —c  and  on  E0  —  E' 

t(z-,G,F0)  =  I(G,P;F0)  -  \(f(z)-I<j) 

and 

f  [i{z;G,F0)  -  \f(z)]dF0(z)=  f  [i(z;G,  F0)  -  \f(z)]  dF0(z) 

JK  JE' 

+  /  [i(z-,G,Fo)-\f(z))dFo(z) 

JEo-E' 

I  [*(2;  G,  F0)  —  A/(z)]  dF0(z) 

J  K  —  Eq 


(2.21) 


(2.22) 


we  have 


-  I(G,P-,Fo)-\Kj  <  -I(G,  P\  F0)  -  \Kj, 


(2.23) 


i.e.  a  contradiction.  Hence  C2  must  be  true  to. 
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Theorems  1  and  3  reduce  the  calculation  of  the  distributions  describing  the 
reaction  strategies  to  finite-dimensional  non-linear  programming  problems.  They 
cam  be  used  to  simplify  the  search  for  conservative  strategies  which  are  optimal  for 
either  player.  In  Theorem  4  below  we  assert  the  existence  of  conservative  strategies 
for  each  player. 

Theorem  4:  For  the  game  described  in  Case  AI,  there  exists  a  conservative 

strategy  (dG(d),  dP(  x))  for  the  communicator  and  a  conservative  strategy  dF(z ) 
for  the  jammer,  i.e.  strategies  such  that 


i) 

ii) 


min  l(G,  P ;  F)  =  max 

dF(z)  dP{x)dG(8) 

max  I(G,  P ;  F)  =  min 

dP(x),dG(9)  dF(z) 


min 

dF(z) 


max 

dP(x),dG(6 ) 


I(G,P;F). 

I(G,P;F) 


and 

(2.24) 


Proof  of  Theorem  4: 

From  Lemmas  I  and  2  we  note  that 

a)  I(G,  P\  F)  is  lower-semicontinuous  in  dF(z)  for  each  {dG(0),  dP{x))  and 

b)  There  exists  ( dG(9),dP(x ))  3l(G,P',F)  is  lower  semi-compact  in  dF(z). 

Theorem  4(i)  now  follows  from  a  fundamental  existence  theorem  [Aubi  82,  pg  209, 
Th.  1].  Theorem  4(ii)  follows  similarly.  □ 


2.3.2  The  Remaining  Cases 

Case  BI:  With  F(z)  now  recognized  as  a  one-dimensional  distribution  Theo¬ 
rems  1  and  2  are  easily  seen  to  be  true. 

M 

Case  Cl:  We  redefine  S  as  follows:  S  =  \j  L,  where  Lx  is  the  space  of 


product  distributions  such  that 


i  =  1 


By  previous  arguments  each  L,  is  Levy  compact  and  hence  so  is  S.  Now  the 
proofs  of  Th.  1  and  Th.  2  follow  as  before. 

Case  DI:  We  perform  the  analysis  by  fixing  D  —  1  of  the  D  distributions 
dF-i, . . . ,  dFo  ■  By  minor  modifications  in  the  proof  of  Lemma  1  we  see  that  I(X\  Y ) 
is  a  Levy  continuous  functional  of  dFi(z)  for  each  i.  Defining  S  and  S1  similarly 
except  that  now  both  are  spaces  of  distributions  of  dF{[z ,)  instead  of  dF(z)  we  see 
that  for  each  (dG{J),dP(x))  the  jammer  can  achieve  the  minimum  in 

max  min  I(G,P\F )  (2.25) 

(dG(0),<fP(*))  dF(z)=dF1(z1),dF,M,...dFD(zD) 

with  a  distribution  dFi  concentrated  at  at  most  M(L  —  1)  +2  points. 

Since  i  is  arbitrary  we  can  assert  that  the  jammer  can  achieve  the  minimum 
in  (16)  with  distributions  dFi »  *  =  1, . . . ,  D  each  of  which  are  concentrated  at  at 
most  M(L  —  1)  +  2  points.  Part  (b)  of  Theorem  1  and  Theorem  2  are  easily  seen 
to  be  true  as  stated  for  this  case. 

2.4  Case  ATI:  Decoder  Informed 

We  have  an  arbitrary  joint  distribution  on  Z\, . . . ,  Zd-  The  jammer  chooses 
dF{z)  .  The  communicator  chooses  dG(9)  and  further  the  decoder  knows  0.  The 
jammer  knows  only  the  the  set  of  quantizers.  He  also  knows  that  the  decoder  is 
aware  of  the  actual  quantizer  chosen. 

In  this  case  we  make  a  “compatibility”  assumption,  that  is,  for  every  0  and 
dF(z)  the  capacity-achieving  input  distribution  dP(x)  remains  the  same.  While 


“compatibility”  certainly  restricts  our  model  applicability,  we  show  by  example 
that  it  is  often  a  worst-case  assumption.  For  instance,  we  know  [Dobr  59]  that  if 
M  =  L  and  if  the  jammer’s  strategy  set  is  restricted  such  that  for  each  distribution 
dF(z)  and  quantizer  0,  Prob  {  error|x  }  <  e  for  every  x,  then  the  saddle-point 
strategy  for  the  jammer  is  to  choose  a  distribution  such  that 

P(v\x)  =  Jj  for  all  y,x  if  e  >  1  -  ^ 
and 

p(y\x)  =  jf~[  x  if  e  - 1  ~  m 
=  1  —  e  y  =  x 

and  the  saddle-point  strategy  for  the  communicator  is  to  choose  a  uniform  dis¬ 
tribution  on  the  input  alphabet.  In  our  model  this  corresponds  to  choosing  the 
canonical  noise  variables  so  that  p(y|x,  9)  is  a  symmetric  channel  for  each  9.  Such 
symmetry  (and  thereby  “compatibility”)  is  obtained  in  a  number  of  other  situa¬ 
tions  as  a  saddle-point  strategy.  Under  certain  conditions,  when  we  have  convex 
constraints  in  the  M  noise  variables  affecting  the  M  inputs  of  the  channel  which 
are  invariant  under  any  permutation  of  the  M  variables  (i.e.  a  “symmetric”  con¬ 
straint)  then  the  choice  of  a  uniform  distribution  on  the  input  and  the  choice  of  a 
symmetric  channel  are  saddle- point  strategies  for  the  communicator  and  the  jam¬ 
mer  respectively  (see  Appendix  E).  To  describe  one  more  example,  if  we  have  M 
inputs  and  M  outputs, 

Vi  =  n,  i  =  1 , . . . ,  ,Vf,  i^j 
Vi  =  A  +  rij,  i  =  j, 

nl  are  N(0,vt),i  =  1 , , . . ,  M  independent  random  variables  and  there  is  further 
the  constraint  v>  =  c,  then  by  arguments  similar  to  those  in  Appendix  B  the 
saddle  point  strategy  is  to  choose  V{  =  fj  and  a  uniform  distribution  on  the  input. 


Utilization  of  the  “compatibility”  assumption  allows  us  to  write  the  two  pro¬ 


grams  as 


si?,  st?  E°we’F»- 

and 

max  min  Eg(C(9,  F)) 

dG(e)  dF(z)  v  v  " 

where  C(9,  F)  =  max  1(9;  F)  and  1(9;  F)  =  I(X;Y\9). 

In  this  section  we  prove  the  existence  of  a  saddlepoint.  The  main  result  is 
stated  in  the  following  theorem: 

Theorem  5:  There  exists  a  pair  of  distributions  dG*(9),dF*(z ))  such  that 
EG(C(8,F’))  <  Eg.(C(9,F •))  <  Eg-(C(9,F)) 

for  all  feasible  dG(9),  dF(z),  i.e.,  (dG*(9),  dF*(z))  is  a  saddle  point  for  the  game 
in  case  AIL 

Proof  of  Theorem  5:  The  set  of  all  feasible  dF ’s,  i.e. 

{dF(z):  f  f(z)dF(z)  <  Kj},  0  <  z,  <  6, 

J  K 

is  clearly  convex  and  compact.  The  set  of  all  dG ’s  is  also  convex  and  compact. 

We  note  that  for  any  fixed  dF(z),  C(9,  F)  is  a  continuous  function  of  9.  By 
our  earlier  arguments 

P(y  I  x,9)  =  f  p(y\x,9,  z)dF(z) 
is  a  continuous  function  of  9. 

Hence,  Pyx(9 )  is  a  continous  function  of  9.  Also  C(9,  F)  =  C(Pyx(9))  and 
we  know  that  C(Pyx(9 ))  is  convex  in  Pyx(9). 

Therefore,  for  every  9  €  0  3  Pyx(9 )  is  not  deterministic,  C(Pyx(9))  is  a 
continous  function  of  Pyx(9).  Hence,  for  fixed  dF(z),C(9,  F)  =  C(Pyx(9))  is  a 
continuous  function  of  9  and  so 
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Eg(C($,F))=  f  C(0,  F)dG(O) 

Je 


(2.26) 


is  a  Levy  continuous  functional  of  dG(9). 


Since  Eg(C(0,  F))  is  linear  it  is  also  a  concave  function  of  dG(0)  in  dG(0)). 


Next  we  note  that  C(0,F)  is  convex  in  dF(z )  for  each  9  since  C(Q,F)  = 


C(Pyx{9)).  Hence 


C(6,aFx  +  (\  -a)F2)  <  aC(9,  F1)  +  (1  -  a)C{9,  F2)  0  <  a  <  1. 


Taking  expectations  w.r.t.  G 


f  C(9,  aF1  +  (1  —  a)F2)dG(9) 

Je 

<  J^(aC(9,  F1)  +  (1  -  a)C(9,F2))dG(6). 


Thus 


EG(C(0,aFl  +  (1  -  a)F2)  <  aEo(C(9,  F1))  +  (1  -  Q)EG(C(6,F2)). 


Consequently,  Eg(C(9,F))  is  a  convex  function  in  dF(z). 

Also  Ea(C(0,  F))  is  Levy-continuous  in  dF(z).  To  prove  this  it  suffices  to 


show  that  for  any  sequence  Fn  converging  to  F  in  the  Levy  metric 


EG(C(9,Fn))-+EG(C(e,F)). 


Since  convergence  in  the  Levy  metric  is  in  our  case  equivalent  to  weak  conver¬ 
gence  (see  Appendix  B)  it  suffices  to  show  this  for  Fn  F  However, 


lim  Eg(C(9,  Fn)) 


=  lim  f  C(9,  Fn)dG 

n  J  6 

=  J  lim  C(9,  Fn)dG  (by  the  Dominated  Convergence  Theorem) 

=  /  C(9,  F)dG  (since  C(0,  F)  is  Levy  —  continuous  in  F) 

Je 


=  Eg(C(6,  F)) 


i».  ji.I 
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which  proves  Levy-continuity  in  dF(z).  From  these  properties  of  th<‘  objective 
function  and  the  convexity  and  compactness  of  the  feasible  strategy  sets  we  recog¬ 
nize  that  the  hypotheses  of  the  Sion  minmax  theorem  of  game  theory  are  satisfied 
[Aubi  82,  Th  7,  pg  218].  This  concludes  the  proof  of  Theorem  5.  □ 

We  note  that  these  saddle-point  distributions  need  not  have  finite  support. 
However,  in  this  case  we  have  an  equlibrium  and  with  no  further  knowledge  of 
each  other's  choice  of  strategy,  the  jammer  and  the  quantizer  should  be  content 
utilizing  dG'{9 )  and  dF'(z). 

Using  the  Optimization  Theorem  and  the  Constrained  Optimization  Theorem 
we  can  derive  necessary  and  sufficient  conditions  at  these  saddle  points.  Given  any 
dG(9)  and  the  power  constraint  we  define 


Uc(Kj,G)=  sup  -Ec(C(0,F)) 

Fes 

<  Kj 

and  given  any  dF(z)  we  define 


VC(F)  =  sup  Eg(C(0.F))  (2. 

G  eQ 

where  Q  is  the  space  of  distributions  on  0.  Then  we  have 

Theorem  6:  The  saddle-point  strategies  dF* ,  dGm  satisfy  to  the  following  ineqi 


EcA  J(-~i(z:9.F *)  -  A  f(z))dF(z))  <  EG.(-C{9,  F'))  -  A  A'. 


for  some  A  >  0.  for  all  F  where 


i{z;  0,  F)  =  P(x)p{}j\x.  z.9)  log 


J  p(y\x,  z.O)dF(z) 
yzP(x)  }' piy\x.  z.  0)dF(z  ; 


ErjC(9.  F'Y)  <  En-iCiO.F’)) 


for  all  G. 


••  • 


Proof  of  Theorem  6: 

For  any  F  denote  the  weak  derivative  of  Ea(C(9,  F))  at  G0  as  DGo(Fg(C(8,  F)) 
and  for  any  G  denote  the  weak  derivative  of  Eg(C(0 ,  F))  at  F0  as  Df0{Eg(C(9 ,  F)). 
Using  Lemma  3  and  the  Dominated  Convergence  Theorem,  we  have 

DFi(Eg(-C(0,F2))  =  Eg(-  J~i(z-,9,F1)dF2)  +  EG(C(9,Fl))  (2.31) 

for  any  Fi,F2. 

Also 

DGl(EG7(C(9,F)))  =  Eg,(C(9,F))  -  EGi(C(0,F)).  (2.32) 

Now  letting  Fj  =  F*,  G\  =  G *  in  the  first  equation  we  have,  using  the  Con¬ 
strained  Optimization  Theorem  and  the  Optimization  Theorem  and  the  properties 
of  Eg{C(9,  F))  as  in  Theorem  2,  that  a  necessary  and  sufficient  condition  for  F* 
to  achieve  Uc(Kj,  G *)  is 

EC'(-  j (*(z;  9,  F’)  -  A f(z))dF(z))  <  EG‘(—C(0,  F‘))  -  \Kj  (2.33) 
for  some  A  >  0,  for  all  F. 

Letting  F\  =  i  =  Gm  in  the  second  equation  gives  us  similarly  that  a 

necessary  and  sufficient  condition  to  achieve  VC(F“)  is 

Eg(C(0,  F'))  <  Eg-(C(0,  F*))  (2.34) 

for  all  G. 

Since  at  a  saddle-point  Uc(Kj,  G*)  and  VC(F‘)  are  simultaneously  achieved, 
the  theorem  follows.  □ 

2.4.1  The  Remaining  Cases 

Case  BII:  Theorem  5  holds  with  F(z)  as  a  one-dimensional  distribution. 


Case  CII:  Although  S  is  compact,  it  is  not  convex  and  so  we  cannot  demonstrate 
that  there  is  a  saddle  point  strategy. 

Case  DII:  Again  we  have  that  Eg(C(9,  F))  is  a  Levy  continuous  functional 
of  dG(9)  and  is  concave  in  dG(9).  Also  Eg(C(9,  F))  is  Levy  continuous  in 
(dFi(z), . . .  ,dFo{z))-  However  Eg(C(9,  Flt . . .  Fq))  is  not.  convex  in  (Fu  . . . ,  FD ). 
Hence  we  cannot  assert  the  existence  of  a  saddle  point  in  this  case. 

2.5  Fixed  Quantizer 

Before  concluding  this  chapter  we  point  out  that  if  we  did  not  have  randomized 
quantization  then  without  “compatibility”  the  game  would  have  a  saddle-  point 
where  the  jammer's  saddle-point  distribution  need  be  concentrated  at  at  most 
M(L  —  1)  -f  2  points.  We  summarize  this  in  Theorem  7. 

Theorem  7:  For  any  quantizer  9,  there  exists  a  pair  of  distributions  dP’(x),dF'(z ) 
such  that 

1(9 ,  P ,  F*)  <  1(9 ,  P\  F*)  <  1(9,  P\  F)  (2.35) 

for  all  feasible  dP.dF.  Moreover  dF‘(z)  can  be  chosen  to  be  concentrated  at  at 
most  M(L  -1)4-2  points  and  necessary  and  sufficient  conditions  for  dFm{z)  and 
dP'(x)  are  for  some  Aj,  A2  >  0 

—  i(z:  9.  F*)  <  -I(9.P\F')  4-A,(/(.M  -  Kj)  (2.36) 

for  all  c  £  I\  and 

-  . (z:9.F’)  =  -1(9,  P‘,  F*)  +  A ,(/(.')  -  Kj)  (2.37) 

for  all  r  £  E0  where  if  .:  .. .)  is  as  defined  in  Theorem  2  with  G  concentrated  on  9 
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for  all  x  3  Pm{x)  >  0  and 


/*(0,P*,F*)<  A2 


(2.39) 


for  all  x  3  P*(x)  =  0  where 


Proof  of  Theorem  7: 

From  me  proof  of  Theorem  5  we  know  that  all  we  need  to  show  is  that  1(9 ,  P.  F) 
is  (Levy)  continuous  in  d.P(x).  We  show  this  by  considering  any  sequence  dPn(x)  -^+ 
dP(x)  and  showing  I(9,Pn,F)  — ►  1(9,  P,F).  Since  x  belongs  to  the  finite  set  A, 
weak  convergence  is  equivalent  to  convergence  in  any  finite-dimensional  metric. 
Now 

I /<*,  Pn.  F)  -  /(«,  P,  F)|  =  I  £  P„(x)p(y|x,»)  log  pv'f’.T'  ai 

7?  ZxPn(x)p(y\x,6) 


£P(x)p(y|x,*)logE^^jM) 


t.v 


<  l£]Fn(x)p(y|x,0)log 

*.V 

-  £Fn(x)p(yjx,0)log 


p{y\xJ) 


r.y 


£rF„(x)p(y|x,0) 

p(yjx,g)  . 
F(x)p(yjx,  $) 


+  lEP,(x)p(y|x.«)logEi;^l)^^) 


z\l 


-  22  P(x)p(y|x,0)log  —  vJ 

7?  2^*  F(x)p(yjx,9) 


<  i  £  Pn(x)p(t/lx,9)H  log 


LxPn(x)p(y\x,9) 


z.y 


where  D  =  maxr  „  p(y\x.  9)  log 


Er  P(x)p(y\x,9) 
+  £»Pn(x)-P(x)| 

r 

p(y|x.g) 

Hxp(y\x,9) 

ZtPn(x)p(y\xJ) 


(2.40) 


<  £Z^!  log 


£rP(x)p(y|x.0) 
+  £D|P„(x)-P(x)|. 


(2.41) 
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Again  since  A  is  finite  we  can  say  that  for  all  6  >  0  3 N  such  that  for  all  n  >  N 

P  (x) 

1-6  <  -£-{■  <  1  +6  Vxe  A 
P(x) 


is  <MMpn<l  +  sv,eA 

-  P(x)p(y\x1 9)  - 

1-6  <  %.%&PLp*l  <  1  +  6  Vx  €  A. 
HxP{x)p(y\x,9) 

By  the  continuity  of  the  log  function  we  can  say  that  Vc  >  0  36  >  0  3 


(2.42) 


.  ^  lw  E«^n(x)p(y|x,5)  ^ 

■£5|logE>(x)P(!,k«)IS£- 

The  second  term  in  (2.41)  can  also  clearly  be  made  <  e  for  sufficiently  large  n. 
Thus  the  continuity  of  1(9,  P,  F )  w.r.t.  P  is  confirmed  and  the  first  part  of  the 
theorem  follows.  The  bound  on  the  number  of  points  of  support  of  dF“  follows 
from  Theorem  1(a).  The  necessary  and  sufficient  conditions  are  derived  as  before 
from  Theorem  3  and  well-known  results  about  channel  capacity  [Gall  68,  pg.9l] . 
□ 


2.6  Channel  Cutoff  Rate 


In  this  section  we  show  how  the  results  obtained  for  channel  capacity  also  carry 
over  when  the  performance  measure  we  choose  is  Rq,  the  channel  cutoff  rate.  For 
a  channel  given  by  a  transition  probability  matrix  p(j/|x),  Rq  is  defined  as 

Ro  =  max(— log£^(J(X1,  X2))) 

where  the  maximization  is  over  all  distributions  dP(x)  on  the  input,  .Yi  and  X2 
are  independent  random  variables  with  distribution  dP(x)  and 

=  £  s/p(ylxt)p(y\xt). 

V€B 


The  cutoff  rate  is  the  largest  number  for  which  there  is  a  linear  error  exponent 
viz.  a  bound  of  the  form  Pe  <  2~n(Rc~R\  relating  the  error  probability  of  the 
best  code  of  length  n  and  rate  R ,  which  is  true  for  all  R  <  Rq.  It  is  the  rate 
beyond  which  sequential  decoding  of  convolutional  codes  becomes  intractable  and 
is  widely  interpreted  to  be  the  largest  rate  at  which  “practical”  coding  systems 
can  be  implemented  [Vite  79], [  Mass  80]. 

For  the  compound  channel  the  appropriate  Rq  to  use  is 

max  min  (-l°g  E(J(XUX2))) 

dP(x)  p(y|z)€VV 

where  W  is  the  channel  set.  This  is  the  largest  number  for  which  there  is  a  linear 
error  exponent  for  the  compound  channel.  This  follows  from  the  random  coding 
exponent  function  and  the  sphere  packing  exponent  function  for  the  compound 
channel  [Csiz  81,  Lemma  5.4  and  Theorem  5.10].  Thus  in  our  game-theoretic 
formulation  the  communicator  wants  to  choose  ( dP(x),dG{0 ))  to  achieve 


max  min  Rq(G,  P ,  F) 

dP(x),dG(8)  dF(z) 


where 


Ro(G,P,F)£ 


-  log(£(  £  y If  J  p(y\xuz,e)dG(0)dF(z)]J J  J  P(y\x2,z,9)dG(0)dF(z))) 
and  the  jammer  chooses  dF(z )  to  attain 


min  max  Rq(G.  P,  F). 

dF(z)  dP(x) 

In  case  AI  we  are  able  to  derive  results  similar  to  the  previous  case.  Theorem  8 
below  recovers  the  same  result  for  Rq{G ,  P,  F )  as  Theorem  1  did  for  mutual  infor¬ 
mation. 

Theorem  8:a)  The  jammer  can  achieve  the  minimum  in  max  min  R0(G,  P,  F) 

(dGW.dP(x))  dF(z) 

with  a  distribution  concentrated  at  at  most  M(L  -  1)  +  2  points. 
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b)  The  communicator  can  achieve  the  maximum  in  min  max  FLlG.  P,  F) 

iF{z)  ( dG(6).dP(x )) 

with  a  distribution  concentrated  at  at  most  M(L  —  1)  +  1  points. 

Proof  of  Theorem  8: 

As  in  the  proof  of  Lemma  1  we  note  that  Pyx  is  a  Levy-continuous  function  of 
dF(z).  From  the  functional  form  of  RoiG.P^F)  it  is  clear  that  it  is  a  continuous 
function  of  Pyx  (it  is  easy  to  see  that  for  all  possible  Pyx,  the  argument  of  the  log 
can  never  be  0).  Hence  we  have  that  Rq(G,  P,  F)  is  a  Levy-continuous  functional 
of  dF{z)  for  any  (dG(0),  dP(x)).  The  rest  of  the  proof  follows  exactly  the  corre¬ 
sponding  steps  in  the  proof  of  Theorem  1  with  Rq(G ,  P ,  F)  replacing  7(G,  P\  F). 
For  part  (b)  we  point  out  that  7?o(G,  P,  F)  is  a  Levy-continuous  functional  of 
(dG(0),  dP(x))  and  proceed  as  before.  □ 

Furthermore,  we  can  also  derive  results  similar  to  Theorems  2.  3  and  4  with 
/?o(G,  P,  F)  as  the  objective  function.  We  do  this  in  Theorems  9,10  and  1 1  whose 
proofs  we  sketch  briefly. 

Given  any  (dG(0),  dP(x))  and  the  power  constraint  we  define 


Ur(Kj,G)=  sup  -R0(G.P.F) 
F  6  S 
h  F  <  A  j 


and  as  before 


D(F)  =  /  f(:)dF(z)  -  Kj 
J  K 


Theorem  9:  l  R(hj.G)  is  achieved  bv  a  distribution  Fn  €  S  satisfying  D i  F,,i  <  0 
and  a  necessary  and  sufficient  condition  for  l'R{  Kj.G )  =  -R„\G.  P.  F„\  is  that  for 
some  constant  A  >  0 

/  [q(z-.F^G)  -  A  }[z)\dF[z)  <  T  ,{(i.  P,  F0)  -  \h,  i_M4, 

J  f\ 


where 


7<  c:Fo.G)  =  E  (£ 


-  \dF]p\  t/lr..  Z] 


f  P<  !/\*  2-  :  »IF\ 


yZw 


f  p{y\xi,z)dF\P{y\X\,z) 


2s/ j  p{y\xuz)dFl 

Also 

T0(G,P,F)=  exp(  —  Ro(G ,  P,  F)). 

and  the  expectation  is  w.r.t.  independent  random  variables  .Yj,  X2  with  common 
distribution  dP[x). 

Proof  of  Theorem  9: 

Given  ( dG.dP )  we  have  from  the  definition  of  Rq  that 

mmRo(G'P,F)  =  -  xmlx(-Ro(G,  P,  F))  =  -  max  log(£(  J(XU  X2))) 

Maximizing  -Rq(G<P,F)  is  equivalent  to  maximizing  exp(-/?o(G,  P,  F))  which 
is  the  same  as  maximizing  E(J(X\,  Xj)  =  E(Yi y  \Jp{y\x\)p(y\^2)-  For  notational 
purposes  let  us  denote  exp(  - Rq(G,  P,  F))  as  T0(G ,  P,  F)  =  £(£„  ^/p(j/|ar1)p(j/jx2))- 
We  show  that  T0(G ,  P,  F)  is  a  convex-cap  (concave)  functional  of  F.  Let 


p‘(yii>) 

=  J 

p(y l*i, 

z)dFx{z 

pJ(y|*i) 

-  J 

p(yki> 

z)dF2(z 

p'<y\zi) 

=  J 

* 

p(y|*j. 

z)dFx(z 

p2(y|j-2) 

=  J 

* 

p(y|j2, 

z)dF2( : 

ya  +  4> 

2\/ab 

we  havf 

» 

'‘•yiJilp^ 

ykt ) 

>  -\fpx 

(y|z,  lp‘l 

where  phyix,  !p*iy|xj)  may  be  chosen  as  a  and  p!(  y  |  x  2  J  p2  ( y\x\  )  may  be  chosen  as 
h 

Therefore 

rtJp'l  y|  J|  lp'(ylx2  )  ■+■  I  1  -  r>  )2p2l  !p‘(  y  |Xj| 

oi  1  -  o  ip1 1  yu |  ip2f  y  |x2 )  +  o(  1  —  o  Ip* I  y  i t  i  !p 1 1  y  i  j i  i 


Theorem  11:  There  exists  a  conservative  strategy  (dG(0),  dP(x))  for  the  com¬ 
municator  and  a  conservative  strategy  dF(z)  for  the  jammer  i.e.  strategies  such 


i)  max  Ro(G,  P,  F)  =  min  max  Ro(G,P,F) 

’  dP(x),dG(9 )  K  '  dF(i)  dP(x),dG(9) 


ii)  min  RqCG,  P,  F)  =  max  min  Rq(G,P,F). 

’  dF(z)  K  '  dP(x)dG(8)  dF(z)  K  ' 


Proof  of  Theorem  11: 

Again,  parallels  almost  exactly  the  proof  of  Theorem  4.  □ 

We  also  note  here  that  results  similar  to  that  derived  in  the  case  with  mutual 
information  as  our  objective  function  hold  in  cases  BI,  Cl  and  DI  with  Ro(G,  P,  F) 
as  the  objective  function.  However  we  cannot  achieve  a  saddle-point  for  the  case 
with  side  information  (with  randomized  quantizers  and  "compatibility”)  because 
R0(G ,  P ,  F)  is  not  necessarily  convex  in  F  and  such  convexity  is  essential  for  any 
saddle-point  to  exist.  If,  on  the  other  hand,  we  give  up  randomization  of  the 
quantizer  (and  do  not  assume  "compatibility”)  then  we  once  again  have  a  saddle- 
point  with  the  jammer’s  saddle-point  distribution  having  finite  support.  This  is 
stated  in  Theorem  12. 


1 


Theorem  12:  There  exists  a  pair  of  distributions  ( dP‘(x),dF‘(z ))  such  that 


Ro{G,  P,  Fm)  <  Ro(G-,Pm,Fm)  <  Ro(Gm ,  P* ,  F) 


(2.47) 


i 

i 


for  all  feasible  ( dP,dG ).  Moreover,  dF'{z)  can  be  chosen  to  be  concentrated  at  at 
most  M(L  —  1)  +  2  points  and  necessary  and  sufficient  conditions  for  dF‘{z)  and 
dP‘{x)  are  that  for  some  At ,  A2  >  0 


■.fl 


q(z;  F‘,  9)  <  T0(6,  Pm ,  F*)  +  A ,(/(*)  -  Kj) 


for  all  z  G  K.  Also 


(2.48) 


a 


3 


q(z-  F‘,9)  =  To(9,  P\  F‘)  +  A ,(/(z)  -  Kj) 


(2.49) 


$ 


for  all  2  €  E0,  where  Eq  denotes  the  points  of  increase  of  F’  and 


To{0,  P,  F)  =  exp(—Ro(8,  P,  F)) 

and 

'£.\/p(yF«'Wy,P')  =  ^  (2-50) 

y 

for  all  x  3  P*(x)  =  0  where  0(y,  P)  =  P(x)p(y\x,  9)  and 

X 

(2.51) 

y 

for  all  x  3  P’(x)  >  0. 

Proof  of  Theorem  12: 

We  work  with  Tq(9,  P ,  F)  instead  of  Ro(0,  P,  F).  We  can  do  this  because 
Ro(9,P,F-)  <  Ro(9,P-,F *)  <  Ro{^P\F) 

&  T0(9,  P\  F)  <  r0(^  P*,  P*)  <  To(0,  P,  Fm) 

and  so  a  saddle-point  for  R0  is  a  saddle-point  for  T0  and  viceversa.  Obviously 
To(9,  P,  F)  is  continuous  in  P  and  continuous  in  F.  Moreover  from  the  proof  of 
Theorem  9  we  know  T0(9,  P,  F)  is  convex-cap  (concave)  in  F. 

Furthermore  To(9,P,F )  is  convex  in  P  [Vite  79,  pg.140].  Using  the  Sion  mini¬ 
max  theorem  as  in  Theorem  5,  the  first  part  of  this  theorem  follows.  The  bound 
on  the  number  of  points  of  support  follows  from  Theorem  8.  The  necessary  and 
sufficient  conditions  follow  from  Theorem  10  and  well-known  properties  of  the 
optimizing  distributions  for  the  channel  cutoff  rate  [Vite  79,  pg.205].  □ 

2.7  Conclusions 


We  have  constructed  fairly  general  channel  models  which  are  capable  of  repre¬ 
senting  a  number  of  jamming  situations.  The  jammers  we  have  considered  have 


all  been  non-adaptive  and  using  results  from  the  compound  channel  we  are  able 
to  give  operational  significance  to  our  minimax  performance  measures,  i.e.  we  can 
assert  the  existence  of  encoders  and  decoders  which  can  perform  at  arbitrarily  low 
probabilities  of  error  at  rates  close  to  our  performance  measures.  Our  analysis  is 
also  clearly  applicable  to  many  restrictions  on  the  jammer’s  strategy  set  other  than 
the  ones  we  have  considered. 

In  the  case  with  the  decoder  uninformed  (case  I)  we  have  shown  that  the 
worst-case  jammer  strategy  (from  the  communicator’s  perspective)  as  well  as  the 
worst-case  communicator  strategy  (from  the  jammer’s  perspective)  needs  only  be 
one  of  the  class  of  distributions  with  support  on  a  finite  number  of  points.  We 
have  a  bound  on  the  number  of  these  points  of  support  in  terms  of  the  sizes  of 
the  input  and  the  output  alphabet.  Thus  we  have  reduced  the  computation  of 
the  worst  case  jamming  strategies  to  a  finite-dimensional  non-linear  programming 
problem.  Moreover  we  can  characterize  these  distributions  by  necessary  and  suf¬ 
ficient  conditions  which  are  fairly  easy  to  test.  All  the  above  has  been  done  for 
both  objective  functions:  mutual  information,  which  tells  us  about  the  fundamen¬ 
tal  limits  to  communication  in  these  situations  as  well  as  the  channel  cutoff  rate, 
which  tells  us  about  the  ‘practical’  limits  to  such  communication. 

In  the  cases  with  decoder  informed  we  reduce  the  communicator’s  strategy  set 
(either  by  using  the  “compatibility”  assumption  or  by  fixing  a  quantizer)  .  In  this 
case  when  we  have  convexity  with  respect  to  the  jammer’s  strategy  (as  in  cases 
All  and  BII)  we  are  able  to  demonstrate  the  existence  of  a  saddle-point  strategy. 
For  the  case  with  non-randomized  quantization  we  are  further  able  to  characterize 
these  saddle-point  strategies  using  the  earlier  theory. 

As  we  have  mentioned  earlier  all  the  above  presupposes  non-adaptive  jamming. 
The  compound  channel  model  which  we  use  indirectly  by  our  choice  of  objective 
function  is  appropriate  to  use  in  this  case.  We  can  allow  for  more  sophisticated 
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jammers  if  we  incorporate  the  cases  where  the  jammer  s  strategies  are  allowed  to 
depend  on  the  previous  (and  present)  channel  inputs.  The  appropriate  channel 
model  to  use  then  is  that  of  the  arbitrarily  “star”  varying  channel  ( A' VC  )  [Csiz 
81,  pg-233] .  This  model  generalizes  the  arbitrarily  varying  channel  (AVC)  and 
includes  it  as  a  special  case.  It  is  known  that  the  m-capacity  (i.e.  capacity  with 
maximum  probability  of  error  over  all  the  codewords)  of  the  A’V C  is  the  same  as 
that  of  the  corresponding  AVC  [Csiz  81,  pg.232].  This  capacity  is  known  for  the 
case  of  binary  output  alphabet  (and  finite  input  alphabet)  and  is  known  to  be  equal 
to  max  min_/(X;  Y)  where  X  and  Y  are  the  input  and  the  output  respectively,  W 

dp(x)  WgVV  _ 

is  any  channel  chosen  from  the  set  of  channels  W  and  W  is  the  row-convex  closure 
of  W  [Csiz  81).  In  our  case  the  jammer’s  strategy  set  corresponding  is  already 
row-convex  closed  and  hence  the  appropriate  programs  would  be 
a)  For  the  communicator: 


b)  For  the  jammer 


max  min /((?,  F) 

(dG(0),dP(x))  dF(z ) 


min  max  I(G,F) 

dF(z)  (dG(9),dP(x))  ' 


which  is  the  same  objective  function  as  we  have  used.  Similarly,  in  the  case  with 

decoder  informed  we  would  obtain  the  same  objective  functions.  Thus,  all  the 

results  derived  in  the  previous  chapter  for  the  case  of  mutual  information  can  be 

extended  to  the  case  of  the  AmVC  channel  with  binary  output.  This  model  may 

be  viewed  as  a  worst-case  representation  of  adaptive  jamming.  Unfortunately  the 

m-capacity  of  the  AVC  is  as  yet  unknown  for  output  sizes  greater  than  2.  On  the 

other  hand  the  a-capacity  of  the  AVC  (i.e.  the  capacity  with  average  probability 

of  error)  is  known  to  be  either  0  or  else  max  min  I(X;  Y)  where  W  is  the  convex 

iP(x)  w evv 

closure  of  the  set  W  to  which  W  belongs  [Csiz  81,  pg.214].  Since  in  our  model 
the  set  of  channels  is  convex  as  well  as  row-convex  the  a-capacity  is  known  to  be 
greater  than  0  iff  the  m-capacity  is  greater  than  0  [Ahls  78].  Thus  with  average 
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probability  of  error  whenever  the  jammer’s  strategy  set  is  such  that  he  cannot  force 
the  capacity  to  be  0  then  all  the  results  of  the  preceding  chapter  extend  to  the 
case  of  the  A'VC  channel. 


CHAPTER  III 


PERFORMANCE  OF  ORTHOGONAL  SIGNALLING 
IN  UNKNOWN  PARTIAL-BAND  INTERFERENCE 


3.1  Introduction 

In  this  chapter  we  investigate  the  performance  of  simple  signalling  and  demod¬ 
ulation  schemes  over  the  partial- band  jammed  channel.  When  communicating  over 
the  added  white  Gaussian  noise  channel,  orthogonal  signalling  suffices  asymptoti¬ 
cally  to  achieve  capacity,  i.e.  by  choosing  M  large  enough  the  error  probability  can 
be  made  arbitrarily  small  for  all  rates  less  than  capacity  or  equivalentlv  provided 
that  tne  ratio  of  the  energy  transmitted  per  information  bit  and  the  one-sided 
power  spectral-density  Ef,/.\0  is  greater  than  In  2.  Conversely  no  other  signals  can 
achieve  arbitrarily  small  error  probability  when  Eh/.\j  <  In  2.  It  is  also  known 
[Stark  85a], [Stark  85b],  that  provided  codes  of  small  enough  rate  are  used  the  ca¬ 
pacity  of  a  partial-band  jammed  channel  is  the  same  as  that  of  a  white  Gaussian 
noise  channel.  In  the  light  of  this  it  seems  plausible  to  expect  that  for  the  partial- 
band  jammed  (PBJ)  channel,  orthogonal  signals  with  correlation  detection  which 
suffice  in  the  white  Gaussian  case,  could  be  used  as  a  simple  scheme  to  form  a 
reliable  communication  system.  Unfortunately  this  turns  out  not  to  be  true  and  in 
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Section  2  we  demonstrate  this  by  considering  the  limiting  value  of  the  error  proba¬ 
bility  of  orthogonal  signals  in  worst  case  two-level  partial-band  jamming  For  any 
Eb/Nj  the  asymptotic  probability  of  error  is  not  zero  The  analysis  snows  ♦  hat  for 
large  E^/Nj  the  worst-case  jammer  p  :s  very  small  Hence  s  mi  pie  r^nncaii'  v  r: 
the  form  of  diversity  is  next  analyzed  with  major*’,  logic  oe<  a.-  we.;  a.-  m 

ear  combining  at  the  receiving  end  Both  schemes  'art:  out  *•  n'utfi-  :en*  !  '•.»* 
linear  combining  case  indicates  that  when  ‘he  out  pi;*  s  ,f  •  *.e  •  .*r-:’.  ’-i:  -o.' 

sions  are  -ummed  up.  small  values  of  j<  fraction  A  ••ano  a  -  r.eo  t.  x  ■  ■  r,  ,» 

the  number  of  diversity  transmissions  pia\  a  <igmft«  an'  oar  :  tegrau.ng  ”.<• 
performance  when  using  the  sum  statist n  I  bis  -.at  sra.F.  »’.gges«,  meat 

combining,  wherein  we  clip  the  output  of  -a, n  -livers-'  .  Tatis”  •  t-  »  ••  . -r* 
effective  combining  technique.  The  rationale  •«..  r  M  .4  fl#*?-?:*’*  *•  '  »*  \  .t  ■’ 

that  our  probability  of  error  will  decrease  because  now  Tie  .nfreuuerr  Tanstf.T 
sions  with  the  large  noise  components  wul  affec 1  our  sum  mucfi  es*  I ■  *  a.c'W  '<.r 
greater  generality  we  allow  the  clipping  level  to  be  a  function  ot  /  i  )ur  mac. 
sis  indicates  that  for  this  case  we  recapture  the  same  threshold  phenomenon  v-v. 
orthogonal  signals  that  we  had  for  the  AWGN  channels  We  do  'hi-  to  ise  of  ,  er 
tain  Central  Limit  Theorem  approximations.  Since  such  a  threshold  phenomenon 
is  very  sensitive  to  the  kind  of  approximation  used,  we  need  to  use  a  powerful 
non-uniform  “Berry- Esseen"  bound  due  to  Michel  and  a  less  well  known  form  of 
the  Central  Limit  Theorem  due  to  Sirazhdinov  and  Mamatov.  Our  analysis  shows 
that  provided  a  certain  relation  is  satisfied  by  the  clipping  level,  the  number  of 
diversity  transmissions  and  the  number  of  signals. then,  asymptotically,  orthogonal 
signalling  with  diversity  and  clipped  linear  combining  suffices  to  achieve  capantv 
over  the  partial-band  jammed  channel. 


T 


3.2  Orthogonal  Signaling  over  the  AWGN  Channel 


We  consider  the  AWGN  with  one  sided  noise-power  spectral  density  No  watts/Hb 

and  consider  a  set  of  \l  equi-energy  orthogonal  signals  s,(f ),  i  =  1, . . . ,  M,  limited 

to  time  duration  T  seconds  and  with  average  received  power  S  watts.  Thus  the 

energy  in  each  signal  will  be  E  —  ST  joules  and  the  orthonormal  basis  functions 

1  [T 

ran  be  conveniently  chosen  as  <p,(f)  =  (  /  4>m(t)4>n(t))  =  6mn,  m,  n  = 

VE  Jo 

1 . V/.  where  6mn  =  1,  m  =  n,  and  6mn  =  0  m  ±  n).  Using  coherent  correlation 


detection  the  error  probability  Pe  is  known  to  be 


p  | 

P,(- rr.  Xf)  =  1  -  /  ($(u))M-1--F=exp 

-»o  J-oo  v  2ir 


where  £4  =  =  energy  per  bit  and  $(u)  is  the  distribution  function  of  a 

standard  normal  random  variable. 

It  is  well  known  that  (Vite  66] 

lira  P'(^r>M)  =  1  ^r<  ln2 

m-<*>  No  No 

=  0  ^>ln2 
N0 

Now  Shannon's  formula  for  the  capacity  C  of  a  channel  of  bandwidth  W  perturbed 
by  additive  Gaussian  noise  of  uniform  spectral  density  No  and  signal  power  5  is 

c  =  H,|og(1  +  7^) 

If  we  let  the  bandwidth  approach  infinity  (which  is  required  if  we  let  the  number 
of  orthogonal  signals  increase  to  infinity)  then  we  have  that 


lim  C  = 
w-oc  A0ln2 


the  rhannel  coding  theorem  asserts  that  for  reliable  communication  over  this  chan¬ 
nel  the  rate  R{  bits/sec)  must  satisfy 


Now  if  Tk  denotes  the  bit  duration,  i.e.  Tb  =  -jr,  then  this  condition  is  equivalent 

£b 

to  —  >  In  2.  The  converse  to  the  above  theorem  asserts  that  for  ail  signal  sets 
■*’0 

S  Eb 

such  that  R  >  -■■■■—  ,  which  is  equivalent  to  —  <  In  2,  the  error  probability 
iVoln2  N0 

approaches  1. 

However  as  we  show  in  the  following  section  when  we  have  partial-band  jam¬ 
ming,  orthogonal  signals  perform  much  more  poorly. 


3.3  Orthogonal  Signaling  in  Partial-band  Jamming 


We  show  in  this  section  that  the  following  partial-band  jamming  strategy  causes 


the  error  probability  to  be  non- zero  asymptotically  for  any  value  of 


Eb 


(Nj 


Nj  +  No 

is  the  one-sided  power  spectral  density  of  the  jamming  noise).  First  we  describe 
the  model  for  the  signalling  and  for  the  frequency  hopper  and  dehopper  and  the 
background  and  jamming  noise. 

Let  {&(<)}&  0  <  t  <  T  be  a  set  of  orthonormal  signals  with  Sj(t)  =  \/E<j>j(t) 
being  the  signal  transmitted  corresponding  to  symbol  j.  This  signal  is  frequency 
hopped  over  q  frequencies  with  one  symbol  per  hop  and  transmitted  over  a  partial- 
band  jammed  channel  as  Sj(t). 

The  jamming  signal  j(t)  at  the  receiver  is  modelled  as  a  weighted  sum  of  band¬ 
pass  Gaussian  processes  j(t)  =  £Li  (t)ji(t)  where  j,(£)  is  a  Gaussian  random 
process  with  zero  mean  and  spectral  density  (one-sided)  Nj  over  a  bandwidth  W/q 


Hz  where  W  is  the  total  spread  bandwidth  of  the  transmitted  signal.  In  the  sub¬ 
sequent  analysis  W/q  is  assumed  to  be  much  larger  than  j;.  Assume  that  each 
M-ary  band  lies  entirely  within  or  without  the  W/q  bandwidth  support  of  some 
j,(t)  (this  is  a  pessimistic  assumption  and  our  probabilities  of  error  are  higher  than 
without  this  assumption).  Also  assume  that  the  spectral  density  of  5,(/).  is 
such  that  Si(f)Sj(f)  =  0  for  all  /  and  i  ^  k.  Thus  j,(t)  and  jk(t)  are  independent 


random  processes  for  i  k.  Zi(t)  is  a  sequence  of  non-overlapping  pulses  of  dura¬ 
tion  T,  i.e.  Zi(t)  =  Z,m,  mT  <t<(m  +  l)T.  The  jammer  has  the  freedom  to 
choose  the  distribution  of  the  random  variables  Z,,m  subject  to  an  average  power 
constraint: 

ECtzf„)<, 

1=1 

The  partial  band  jammer  chooses  the  following  distribution  for  Z,.m: 


Pr(Zi  m  =  0)  =  1  —  p  0  <  p  <  1 


Pr(Z,,m  = 


where  p  is  a  constant  representing  the  fraction  of  band  which  has  interference. 
Thus  when  the  jammer  is  on  ji(t)  has  noise  spectral  density  Nj/ p  and  when  the 
jammer  is  off  ji(t)  has  noise  spectral  density  0.  Z,,m  are  i.i.d.  for  each  i  and  Z,-m 


is  independent  of  ji(t). 

The  received  signal  is  thus 


r(t)  =  J(t)  +  j(t)  +  n(t) 

where  n(t)  is  the  thermal  noise,  which  is  a  white  Gaussian  process  with  one  sided 
spectral  density  iV0/ 2.  The  signal  is  dehopped  by  a  frequency  dehopper  whose 
output  rd(t)  can  be  written  as: 

rd(t)  =  s(t)  +  2  6(vk,  f(t))Zk(t)jk(t)  +  n(t) 

km  1 

where  jk{t)  is  jk(t)  after  frequency  translation,  {v*}J=i  is  the  set  of  frequencies 
hopped  to  and  f(t)  is  the  hopping  pattern  i.e. 

/(<)  =  /;,  jT  <t  <(j  +  l)T,  f}  £  {v 

The  demodulator  processes  the  received  signal  by  computing  the  A/-dimensional 


vector 
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where 


y,  =  f  rd(t)0i(t)dt 
Jo 

Now  assume  symbol  j  was  sent,  i.e.  3}(t)  was  sent.  Then  for  i  ^  j 
y,  =  [  '£,6(vk,f{t))Zk(t)jk(t)<j>,(t)dt  +  [  n(<)<Z>,(f)dt 

*=i 


and  for  z  =  _/ 


“J*  (J  j 

y,  =  /  52Hvk,  f(t))Zk(t)jk(t)<fi}(t)dt  +  y/E  +  [  n{t)0}(t)dt 

■'°  *=i  -'° 

Thus  for  z  ^  _/ 

y,  =  Z.n,  +  iV, 


and  for  z  =  j 

yj  =  >/£  +  ZjTij  +  JV, 

where  iV,,  i  =  1, . . . ,  M  are  i.i.d.  Gaussian  random  variables  with  mean  0  and 
variance  -V0/2  and  the  n<,  i  =  1, ...  ,Af  are  i.i.d.  Gaussian  random  variables  with 
mean  0  and  variance  Nj/ 2.  Thus  by  conditioning  on  Z:  and  then  averaging,  the 
error  probability  may  be  written  as 

P,(p,Ei,Na..VJ.M)=(l-p)P,{~.M)  +  pP,l - *  M) 

;V°  ± 

P 

For  the  worst-case  partial-band  jammer  the  error  probability  is 


Pe( Eb,  N0,  Nj,  M)  =  sup  Pe(p ,  Eb,  N0,  Nj.  M) 

0<p<l 

For  p  =  0,  Pe(p,  Eh,  N0,  Nj ,  M)  =Pt(^r^  -V/). 

•V0 

Now  we  show  that  for  any  signal-to-noise  ratio  the  worst  case  jammer  can 
ensure  that  the  asymptotic  probability  of  error  does  not  go  to  zero.  This  is  stated 
precisely  in  the  following  Theorem. 
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Theorem  1: 


For 


-Vo  +  .V, 


<  In  2 


lim  max  pPe 

V/  —  oo  0<o<l 


V 

-  T  *V0 


..V/ 


+  (1  -P)  P'(tt  ■  U)  = 

*  ▼  n 


i  i.n 


ii)  For 


•V0  +  Nj 


>  In  2 


lim  max  pPe 

V/— oo  0<f<l 


/ 


y—+% 
\  P 


,  M 


) 


+  d -e)P.  (I5--  M)=  p 

•V0 


where  /»  is  the  solution  of  the  equation 

Eh 


^7 


=  In  2 


4-  .V0 


i.e.  p  = 


In  2 

4  _  A  ln  9 
,v,  N,  m  ^ 


nt  In  2  _ 

Note  :  -ft—  < 

-Lt 

Nj 

Proof  of  Theorem  1: 

<  In  2 


In  2 


In  2 


it  -  4  In  2 

iV7  iVj 


.V0  +  Nj 


•V0  +  -V7 


.lim  max  pPe  (  o  —  77-  ,  M  |  +  (1  -  p)Pe  ( 77-  ,  M 


M— oo  0<p<l 


+  iV0 


.Vo 


>  lim  max  pPe  -w- - — 

M—*oo  0<p<l  r  \  ^J.  +  iV0 


>  lim  max 

jVf— «* 00  0<p<l 


Tiax  pPe  (  — 
<P<1  ViV0 


+  AG 


,  M 


,  M 


1 3.2 1 


(3.3) 


13.4) 

(3.5) 

(3.6) 


(Since  Pe  (x,M)  is  a  decreasing  function 
of  x  )  (see  Appendix  G) 


I  r 


I 


l.m  P,  (  — — .  M 

\f-x  \0  r  \j 


v - v"  ;:>  ln  “ 

O)  *  M 

l-'ir  anv  t  >  0.  let  p}  ■=  p  -  f  where  p  is  such  that 


t  +  Vo 


•«A 

£ 


♦Aj 

in 
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J 

$ 

•  'i 

k 

>«5 


i 


liminf  max 

M  —  oc  <)</><: 


:  pP'  (  YI~\-  •  w  ) 
1  V  ^ +  vo  / 


+  (1  -Ptn  TT  •  Af 


>  liminf  max  pP, 

_  W  —  Tf  0<p<1 


>  lim  inf  p j  P,  -n - — 

-  v-oo  -  ^  +  .V0 

=  P\ 


Since  this  is  true  for  ajiv  i  >  0  we  can  infer  that 


lim  inf  max  p  P.  I  -n - -  .  \t  )  +  ( 1  -  p)Pe  (  —  .  M  )  >  p  (3.14 

Vf— *■  0<p<l  \  ,  +  vo  /  V.V0 

Now  let  p vi  be  the  value  of  p  which  achieves  the  maximum  in 


max  p  P. 

0<p<\ 


*  No 


Then  for  sufficiently  large  \f,  p\f  <  p  Else  for  every  M  there  would  he  a 


V/,  >  If  and  p  vf,  >  ~p  such  that 


•  V/,  >  pPf 


P.w,  P. 


for  all  p,  0  <  p  <  1  Now  for  any  <  >0  the  right-hand  side  of  (3. 16 )  for  sufficiently 
large  M  t  M  >  Xij  say)  can  be  made  greater  than  p  ~  t.  Since  the  left-hand 
side  of  (316)  is  obviously  approaching  0  for  pv/,  >  P-  we  have  a  contradiction. 
Therefore 


L, 
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H  ■ 

i>J 


i 


V 


V 


limsup  max  pPe  -n - — 

V/— .oc  0<o<l  \,+^0 

<  P 


P'  (t- 


M 


(3.17) 


From  (3.13)  and  (3.17)  it  follows  that 

‘  Eh 


lim  max  p  Pe 

W - 50  0<p<l 


+  iVg 


,A/)+(l-,)ft  (£.*) 


(3.18) 


=  1  if 

=  P  if 


Ek 


N0  +  Nj 

Eb 


<  In  2 


>  In  2 


N0  +  Nj 

The  jammer  thus  is  able  to  thwart  reliable  communication  at  any  signal-to-noise 
ratio  by  choosing  p  small  enough  (but  not  too  small).  Since  the  choice  of  a  small 
p  by  the  jammer  allows  a  number  of  transmissions  to  pass  through  unscathed  but 
corrupts  the  rest  substantially  it  seems  likely  that  simple  coding  such  as  through 
diversity  may  be  able  to  recover  some  of  this  loss  in  rate  due  to  partial-band 
jamming.  We  pursue  such  an  investigation  in  the  following  section. 

3.4  Orthogonal  Signalling  with  Diversity 

In  this  section  we  try  to  use  time  diversity  to  overcome  the  partial-band  jammer, 
i.e.  make  the  worst-case  partial-band  jammer  no  more  deleterious  than  a  broad¬ 
band  jammer  of  equivalent  power.  With  diversity  L,  the  energy  per  bit,  £(,,  is 


,  ..  and  we  shall  use  El  to  denote  7 

log2  M  log2  M 

Assume  that  symbol  j  was  sent,  i.e.  s;(<)  was  transmitted.  Then  using  the 
earlier  demodulator  we  have  L  M-component  vector  decision  variables: 


57 


>  r  % 


V> 


i 


and  for  i  =  j 


yij 


=  y/E  +  ZijTiij  +  Nij 


where  n;,,  is  a  N(0,iVj/2)  Gaussian  random  viable,  Z|t,  is  0  with  probability  1  - 
p  and  yjl,  with  probability  p,  iVii(  is  a  N(0,  N0/2)  Gaussian  random  variable  and 
Zi„  riiit  and  N,,i  are  independent.  Also,  yi,...,yL  are  i-i-d.  random  vectors. 

3.4.1  Majority  Logic  Combining 

We  first  investigate  the  following  majority  logic  decoding  strategy.  The  receiver 
observes  the  output  during  each  interval  of  duration  T  and  picks  i  if  the  output  of 
the  ith  correlator  is  maximum.  At  the  end  of  L  intervals  the  output  of  the  decoder 
is  the  symbol  which  has  been  picked  a  maximum  number  of  times.  We  do  the 
asymptotic  analysis  assuming  the  diversity  L  to  be  an  increasing  function  of  M 
for  sufficiently  large  M. 

Now  in  each  time  interval  of  length  T  the  probability  of  error  (i.e.  the  proba¬ 
bility  that  j  will  not  be  picked)  is 

“  =  v)  +  (i  -  pW§;  m) 

P  +  ■‘»0  iVo 

where  Pe(x,  y)  is  the  probability  of  error  for  y  orthogonal  signals  with  bit  energy  to 
noise  ratio  being  z.  Thus  the  probability  of  j  being  picked  in  each  interval  is  1  —  a 


and  the  probability  of  i  ^  j  being  picked  in  each  interval  is 


a 


M  -  1 


.  Using  the 


above  scheme  we  denote  the  probability  of  error  a s  P^{pi,  E'b,N j,  N0,  M)  where 
Pl  is  used  to  denote  the  jammer’s  choice  of  p  as  a  function  of  L.  Since  the  channel 
is  independent  between  repetitions  and  the  same  input  is  applied  to  the  channel 
during  each  of  the  L  repetitions,  the  outputs  of  the  channel  during  the  L  diversity 
transmissions  are  i.i.d.  random  variables  with  finite  mean.  Thus  we  can  utilize  the 
Weak  Law  of  Large  Numbers  as  L  —*  oo,  i.e.  the  probability  that  the  proportion 


of  times  we  choose  j  out  of  the  L  repetitions  differs  from  1  -  Ql  (where  ql  — 
PlP'{  )  +  (1  -  pL)Pe{jfc,M))  by  e  >  0  goes  to  zero: 

number  of  times  j  is  picked  . 
lim  Prob{| - - — - - - - (1  -  ql)|  >  e}  ->  0 

L-+oo  L 

..  „  ,  ,, number  of  times  j )  is  picked  ql  .  , 

2s,Prob(l - 1 - m3tI>£}-*° 

Thus  we  see  that  using  the  above  decoding  strategy  the  limiting  probability  of 
error  will  be  zero  or  one  according  as 


1  —  oil  > 


M  -  1 


M-  1 


>  1  -  aL 


(3.19) 


(3.20) 


and  we  examine  when  this  is  true.  Condition  (3.19)  may  be  written  as 


M-  1 


Ql  < 


Now  if  xf+No  >  In  2  we  know  from  the  previous  section  that 

+ (1  -  ’ M)  =  iFjbi = * 


(3.21) 


Hence 


pM-srrrr) + a  -  <  ?  <  i. 


Therefore  for  all  sufficiently  large  M, 


Ql  < 


M  -  1 
M  ‘ 


Thus  we  can  say  that 


lim  sup  P'(pl,  E'b,Nj,  N0,  M)  =  0 

M,L—*oo 


E* 

Now  assume  j/0+Nj  <  In  2.  Condition  (3.20)  may  be  written  as 


+  (1  -  M) 

> - a — 2 - 

M-  1 


(3.22) 


We  show  that  for  a  particular  choice  of  pl  ( Pl  =  1)  condition  (3.20)  holds  for  all 
sufficiently  large  M.  With  /?£,  =  !,  condition  (3.20)  becomes 


i  _  P  (  .3.  „  <  pAij±K:m2 

e{Nj  +  No'  ’  M- 1 


(3.23) 


for  which  it  clearly  suffices  if 


(3.24) 


since  we  know  that 


lim  Pt(  =  1 

Af-oo  'Wj  +  Wo 


By  an  easy  extension  of  the  derivation  in  [Vite  64,  pp. 106-134]  (3.24)  can  be 
verified.  Thus  we  can  say  from  (3.19), (3.20)  and  (3.24)  that 


lim  sup  P^{pL,E'h,Nj,N0,M)  =  1 

M,Lr-><X>  0<p<\ 

E* 

We  have  established  so  far  that  for  No+Nj  >  In  2 

Jim  sup  P'(pl,  E'b,  Nj,  N0,  M)  =  0  (3.25) 

Af,L— oo  0<PL<\ 

and  for  75$vJ  <  ln2 

lim  sup  PeL(pL,E'b,Nj,N0,M)  =  1  (3.26) 

oo  0<pt<l 

Note  however  that  in  our  diversity  signalling  scheme  the  actual  bit  energy  to  noise 
ratio  is  is  not  but  jVq =  fr^+Nj  and  thus  in  both  (3.25)  and(3.26)  we 

have  allowed  very  large  bit  energy  to  noise  ratios.  Therefore  we  can  say  that 


when  we  use  orthogonal  signalling  with  diversity  against  an  intelligent  jammer  our 
scheme  allows  reliable  communication  when  the  bit  energy  to  noise  ratio  (  y  ) 
increases  fast  enough  with  diversity  i.e.  N(^Nj  >  L  In  2  for  sufficiently  large  L.  If 
A/jJ+%7  <-  L\n2  for  sufficiently  large  L  then  the  worst-case  jammer  can  frustrate 
even  such  high  energy  to  noise  ratios. 


3.4.2  Linear  Combining 


Another  commonly  used  method  of  diversity  combining  is  linear  combining. 
Here  we  process  the  output  to  get  the  following  decision  variables: 


L 

Di  =  Yiyt'i  i  =  l 
/=» 


When  symbol  j  is  sent  then  for  i  ^  j 


Dj  =  +  Nu) 

1*1 

and  for  i  =  j 

L 

i=i 

Again  we  use  P^{pl,  Eb,  Nj,  Ao,  M)  to  denote  the  probability  of  error.  By  condi¬ 
tioning  on  the  number  of  diversities  jammed  we  can  write 


P,L(PL,EiNj,Nc,M)  =  £ 

k=o  XK '  -jf  +  L'l't o 


Consider  first  the  case  — — h——~  >  In  2. 

N0  +  Nj 


PeL(pL,E'b,Nj,N0,M)  = 

L  ,L , 


(i  -  n)LPA%,  M)  +  £  (Lk)  Kid  -  PL)L-kP,( 


PlLEI 


kNj  +  plLNq 


,M).  (3.27) 


E'b  E'b 

Since  Wo  >  W  +  Nj  >  ^rst  term  ®oes  to  w't^1  Let  the  jammer 

choose  pl  =  j  where  s  is  some  number  >  0.  Let  o  denote  the  second  term  in 
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I 


»  tVtLfc 


(3.27).  Then 


“  =  £  (*)  pK1  - 

L  'I 


fclVj  +  plLNq 


,M) 


Mi  +  LiV o’ 

Since  5  is  arbitrary,  let  it  be  chosen  so  that  it  satisfies 

Eb 


N 


Y  +  N0 


<  In  2 


that  is,  s  is  chosen  to  be  less  than  ~p  (as  in  Section  3.1).  Then  we  see  that  every 


EL 


term  Pe(  i,n,  ,  M)  approaches  1  with  increasing  L,  M.  Hence  as  Z,,  M  — ►  oo, 


a  goes  to 


ptdl  ~ n) 


L-k 


=  Km  (1  -  (1  —  pc)L) 


=  1  —  e~*. 


By  choosing  s  to  be  only  slightly  smaller  than  p  we  see  that 


Jim  Pt(pi,  K  Nj ,  A^o,  M)>  l  -  t~~p  >  0. 

M,L—*oo 


(3.2S) 


Now  we  consider  the  case 


El 


No  +  Nj 


<  In  2.  Suppose  that  the  jammer  chooses 


Pl  =  l  for  all  L.  Then 


P?{PL,  El  Nj,  N0,  M)  =  Pe{ 


El 


Vj  +  N0 


,M) 


El 


Since  — — <  In  2 

No  +  Nj 


lim  P.( 


El 


L,M~o°  v  Nj  +  N0 


,  M)  =  1. 


(3.29) 


£t 

Thus  we  have  established  for  linear  combining  that  for  v  A.  >  In  2 


vo+yj 


lim  sup  PeL(pL,  El  Nj ,  ,V0,  .W)  >  I  -  e 


MX— oo  0<pl  <  1 


I  3.30 1 


hi 

ki 

\{ 


S  N 


E' 

and  for  — — b——~  <  In  2 
No  +  iV. / 

lim  sup  P^ipL,  E'b,  Nj,  No,  Xf)  =  1  (3.31) 

M,L—oo  0<pi  <1 

£' 

Thus  (3.28)  indicates  that  the  worst-case  jammer  (when  — — b— -  >  In  2)  jams 

;V0  +  .-Vj 

E' 

such  that  0  and  (3.29)  shows  that  if— — b——  <  In  2  then  the  worst-case 

.Vo  4-  Nj 

jammer  can  choose  pi  =  1.  In  either  case  the  jammer  can  frustrate  very  high  bit 

jr  LE 1 

energy  to  noise  ratios  (  =  7^77). 

E' 

We  note  that  although  for  — — ~~~  >  In  2  the  worst-  case  jammer  chooses  pi 

No  4-  Nj 

such  that  pL  — ►  0,  pL  cannot  approach  0  too  fast  for  the  following  reason.  Let  the 
jammer  choose  pi  =  —  (a  >  1)  .  Now 


P'L(Pl,EINj,N0,M)  = 


-  M)  +  g  (*)  *> 


By  the  jammer’s  choice  of  pi 


PeL(—,EiNj,N0,\l)  = 

Again  we  point  out  that  the  first  term  goes  to  0  with  L.M .  The  second  term  is 


£(£)4u-^<iJ_..v/> 


(3.32) 


Everv  term  P, |  — - —-77 - — .  .V/)  in  the  summat  ion  goes  to  1  as  L  increases  and 

L*-'kNj  +  N  0 

so  as  L  — ►  re.  (3.32)  becomes 


lim  (1  -  (1  -  pi  ) 

L—+-K: 


Next  we  show  that  lim^_oc(l  —  =  1  and  consequently 


lim  P!-(pl.E-.Nj.N0'M)  =  0 

Vf 
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We  now  prove  that  lim£,_00(l  —  pl)L  =  1. 

(1  -  Pl)l  =  (1  -  jf)L 

We  know  lim£_00(l  —  -r)L  =  e“‘  and  so  Vf  G  (0, 1)  3  L0  9  VZ,  >  I0 

±J 

U  -  ^  <  (i  -  (jirKl))1 

and  so 

£s,(1  -  5  - 1*1  = e" 

Since  t  is  arbitrary  we  have 

Jim  (1  -  pL)L  =  1 

L—*00 

In  linear  combining  we  see  that  since  the  output  of  the  L  diversity  transmis¬ 
sions  is  summed  up,  small  values  of  pi,  while  making  it  less  likely  that  a  diversity 
transmission  is  jammed,  make  the  probability  of  error  on  such  a  jammed  transmis¬ 
sion  very  high  because  of  the  low  bit  energy  to  noise  ratio  on  such  a  transmission. 
The  jammer’s  strategy  of  choosing  pi  to  be  inversely  proportional  with  diversity 
level  L  is  intuitively  explicable.  Since  L  outputs  are  added  he  jams  such  that  if  he 
hits  one  transmission  there  is  enough  jamming  power  to  corrupt  the  sum  statistic. 
On  the  other  hand,  trying  to  put  too  much  jamming  power  in  a  single  jammed 
transmission  turns  out  not  to  be  too  effective  because  then  the  number  of  good 
transmissions  increases  sufficiently  enough  to  overcome  the  jamming  noise.  We 
thus  see  that  in  linear  combining  the  few  jammed  transmissions  have  a  significant 
effect  on  the  probability  of  error.  This  suggests  that  if  we  use  a  form  of  clipped 
linear  combining  wherein  we  clip  the  output  of  each  diversity  transmission  our 
probability  will  improve  because  the  infrequent  transmissions  with  the  high  Pr 
values  will  affect  our  sum  much  less.  Possibly  the  clipping  level  can  be  chosen  as 
a  function  of  L.  In  the  next  section  we  conduct  the  analysis  using  this  idea. 


3.4.3  Clipped  Linear  Combining 


The  analysis  in  the  previous  sections  suggests  that  the  jammer  contributes 
infrequently  to  the  decision  statistics  but  when  he  does  so  his  contribution  is  large. 
This  suggests  that  some  form  of  limiting  the  jammer's  contribution  to  the  decision 
statistics  may  be  effective.  Here  we  first  clip  each  of  the  diversity  transmission 
outputs  by  a  symmetric  limiter  and  then  combine  linearly. 

Thus  the  decision  variables  we  use  are 
L 

D\  -  ^  CL  (Zjjrijj  +  yjti  +  v£j  when  ;  is  sent 

i  3.  .34) 


L 

D[  =  £  -t-  y,j  i  I  ^  j  i  3.35) 

i*i 

where 

C tlx)  =  x.  ~aL  <  x  <  aL 
=  aL.  i  >  aL 
=  -<yL.  i  <  -aL 


The  decision  rule  is  to  decide  that  i  was  sent  where  D\  —  max  D\  Using  this 

t 

decision  rule  we  calculate  the  probabilities  of  being  correct 

Prt  correct^  is  sent  )  =  Prli  =  j  j  is  sent  i  (3.36) 

=  Pr{  D[  <  D'r  i  j  is  sent )  (3.37) 

Again  we  use  PrL>  pL.  El  .\j.  .V0.  V/  )  to  denote  the  probability  of  error.  Now  let 


where  Yu  =  Cl  {Ze  n3j  +  Njj  +  \/~E}  (i.e.  the  unnormalized  decision  variable 
containing  the  signal)  cPL  =  ^/Var  £)'  and  let 


E  C,  (ZL  nt>/  +  N,./) 


Dx  = 


sfl  <PL 


1  ^  XLJ 

'/L  £i 


(3.39) 


where  X u  =  CT(Ze  nJti  -j-  N3j  )  (i.e.  the  unnormalized  decision  variables  with 

only  noise).  Note  that  Pr(D't  <  D',i  /  j\j  is  sent)  =  Pr(D ,  <  Dvi  ^ 

j\j  is  sent).  Using  the  £),  ’s  as  decision  variables  we  show  that  we  can  recapture 

the  asymptotic  performance  of  orthogonal  signals  over  the  AWGN  channel. 

Specifically  if  the  number  of  orthogonal  signals,  M,  and  the  diversity  level  L 

increase  in  a  certain  relation  to  each  other  and  the  clipping  level  a*,  is  allowed 

to  increase  with  L,  but  not  too  fast,  then  the  probability  of  error  with  clipped 

linear  combining  exhibits  the  same  threshhold  behavior  in  worst-case  partial- band 

jamming  that  orthogonal  signalling  in  AWGN  achieves,  i.e. 

Ek 

Theorem  2:  i)  For  — - —  <  In  2 

N0  +  Nj 

lim  sup  PfipL,  E'b,  Nj ,  N0,  M )  =  1 

M,L—>oo  Q<p<l 

ii)  For  — 6  >  In  2 

No +  Nj 

lim  sup  P^{pL,  E[,  Nj,  N0,  M)  =  0 

M,L—*oo  o<p<l 

Proof  of  Theorem  2: 

We  proceed  as  follows.  First  we  show  that  the  above  threshold  phenomenon  is 
true  if  our  decision  variables  were  all  Gaussian  with  parameters  chosen  in  a  certain 
wav.  We  do  this  in  Lemma  1.  The  corresponding  probability  of  correct  decoding  we 
denote  as  Pc,a.  Then  by  use  of  some  fairly  powerful  Central  Limit  approximations 
we  show  that  the  difference  between  Pc,0  and  the  actual  probability  of  error  goes 
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to  zero  establishing  the  Theorem.  We  now  proceed  with  Lemma  1.  Let 
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where  <pj(x)  is  the  density  of  a  random  variable  distributed  as  .V  )  •  1 

and  3l  — ♦  1  and  tL  — *  0  and  (x)  is  the  distribution  function  of  a  normal 
random  variable  distributed  as  .V(0.T£,)  where  tl —*  1.  Next  we  show  that: 


Lemma  1: 


i)  If 
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Proof  of  Lemma  1: 


P-, 


Let 


- / : 


oo  1 


oo  v 


2jt 


exp 


-T  X  - 


L(E  +  tL) 


2  \  ^  (^)  K 


r  —  e'>  * 

J-  OO  v/27 


t  du 


M-l 


dx 


Xi  =  i  - 


\j  (Mi) 


Hence 


=  r 

c.a  ~  J 


1  .1,3 

-7=  e  ^ 

oo  \Z2tv 


\ 

L 


+ 


e  3  ,J/i  du; 


M-l 


dx, 


and 


lim  Pc 

M.L—oo 


-  -  r. 


Jim  (  )dx ! 

oo  M,L— oo 


(from  the  Dominated  Convergence  Theorem  since  the  integrand  is  dominated  by 
e”  i  *>  which  is  integrable).  Now  consider 


7  =  .lim  7  m.l 

M,L-+oo 
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as  6  and  tl3l  as  7L  ,  ( 3.10)  becomes 
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where  q^  — *  1.  Using  T  ’  Hospital’s  rule  we  get 
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—  oc  if  6  <  In  2 


0  if  6  >  ln  2 


which  suffices  to  make  (i)  and  (ii)  true  as  claimed.  Thus  In  7.vrx  — »  —  oo  if  <5  <  ln  2 
so  that  ~cwx  — ►  0  for  <“>  <  ln  2.  For  6  >  ln  2.  ln  7.wx  — ►  0  so  that  7m, x  — ♦  1.  □ 

Next  we  show  that 
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where  P^{pl,  Fi,  *Vj,  N0,  M)  ~  1  —  P^{pL,  Pi,  JV/,  No,  M).  For  ease  of  notation  we 
denote  P^{pl,  E'h,  Nj,  V0,  M)  simply  as  Pc.  Now 


I  ?c,a  ~Pc  |  =  i  P c,a  ~  <P}  (x)  [FD,  (x)}M~ldx 

J  —  oo 


+  r  <M*)  -  Fc  j 


where  <^(x)  is  the  density  of  a  random  variable  distributed  as  a  .V 


Z(  E  -f  f & )  \ 
3Li^l) 


with  f-i  — *  0  and  3i  — *  1  and  Fp,(x)  is  the  distribution  function  of  the  random 
variable  D,  .  By  the  triangle  inequality 


I  Pc,a  ~  Pel  <  f"  4>j(x)  S  Fq~  ‘  ( 1 1  -  1  dx 

J  —  OO 

+  r  -  F'0)(x)\\  FDt(x)\M-ldx  +  C{Pr\Yu\>aL\L 

J  —  oo  1 

where  F'Di(x)  is  the  density  of  the  absolutely  continuous  part  of  Fd,{x)  and  C  is 
some  constant.  In  general  if  0  <  a,b  <  1 
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Now 
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=  3i.  Then  3i  —*  1  (see  Appendix  J).  Hence 
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Now  we  use  Michel’s  version  of  the  non-uniform  Berry-Esseen  Theorem  [Mich  81], 


i.e. 
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Now  we  address  the  second  term  Aj. 

A,  =  |  5"  (*)  -  F'd,  (i)  |  |fD.(x)  I"-'  dx 

+  c  [Pr  in..  I  >  at  }L 


It  follows  that 


VL  =  c  [/’{in./I  >  a*,}]1  <  c  [P{|n.i  -  I*L\  ><*L-  I*l}1 


where  Hi  =  Eyli-  Now 


<  c 


N0+Nj 


(aL  -  hl)2j 

where  the  last  step  follows  from  Appendix  I.  Since  ai  -*  oo  and  m  is  bounded, 
we  have 


It  follows  that 


lim  ui  <  0. 

L— oo 


lim  vi  =  0. 

t— oo 


Next  we  note  that  F'Dj(x)  is  the  density  of  the  absolutely  continuous  compo¬ 
nent  of  the  normalized  sum  of  L  i.i.d.  random  variables,  — —  .  each  of  which  has 

s/E  .  ,V0  Si 

variance  1  and  mean  pi  =  -  4-  e i  where  ti  — *  0  and  aJL  — ►  -f  — 

aJL  2  2 

i  see  Appendix  J).  Since  <Z»,(x)  has  been  chosen  to  be  the  density  of  a  normal  ran- 

/  JL(E  ~  777  \ 

dum  variable  with  the  same  mean  and  variance  i.e.  of  a  N  — '  ,  - - rrr-  .  1 

U  (t  +  / 

random  variable  where  * /  — »  0.  Ji  — >  1  we  have  that  the  first  term  of  A?  is 

<  j  ,  <p ,( 1 1  -  /■'[,  (xi  ih 

I  sum  >  '  raitsfiirmat  ion  of  variables  p  -  r  u ;  we  have  that  Aj  is 


71 


where  now  <t>j(p)  is  the  density  of  a  standard  normal  random  variable  and  Fp  ( p ) 
is  the  density  of  the  absolutely  continuous  part  of  the  sum  of  L  i.i.d.  random 
variables  each  of  which  has  variance  1,  mean  0  and  bounded  absolute  third 
moment.  Hence  by  the  result  in  Appendix  H 
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Thus  if  M,  L  and  ai  are  chosen  so  that 


M  al 

sfl 


0  we  have  that 


lim  P £  =  lim  PcL{pL,  E'b,  Nj,  N0,  M) 


M  — *oo 


M-*< 


Thus  we  can  conclude  that  asymptotically  our  clipped  linear  combining  receiver 
exhibits  the  same  threshold  behavior  demonstrated  for  Pca  in  Lemma  1.  Thus 


i)  for 


If  .v0  4  Nj  <  ln  2  PcL(Pl,EINj,No,M)-*0 

If  VtV  >  ln  2  Pt(PL,EiNj,N0,M)^  1 

•v0  +  .Vj 

As  we  have  imposed  no  restrictions  on  pi  we  can  say  that 
Eh 


-V0  +  Sj 


<  In  2 


lim  sup  P'L(pL<  Eb,  Nj,  .V0,  .V/)  =  1 


MX-*oo 
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and  for 
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>  ln  2 


Ijm  sup  PeL(pt,  El  Sj .  An.  V/ )  =  0 
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3.5  Conclusions 

We  have  investigated  the  asymptotic  performance  of  orthogonal  signals  over 
channels  with  both  thermal  noise  as  well  as  unknown  partial-band  interference. 
Knowing  that  such  signalling  suffices  asymptotically  to  communicate  over  the 
AWGN  at  the  limits  prescribed  by  the  channel  capacity  theorem  we  tried  to  re¬ 
cover  from  the  effects  of  the  worst-case  unknown  partial-band  interference  on  the 
performance  of  such  signalling.  The  worst-case  partial-band  jammer  does  degrade 
the  asymptotic  performance  severely  but  he  needs  to  optimize  his  strategy  for  each 
value  of  the  bit  energy  to  noise  ratio  (Et/(Nj  +  No))  chosen  by  the  communicator. 
Our  analysis  reveals  that  for  bit  energy  to  noise  ratios  above  a  constant  (In  2)  the 
jammer  can  be  most  effective  if  he  jams  only  a  fraction  (p)  of  the  band.  This 
is  because  the  probability  of  error  near  the  values  of  Ek/(Nj  +  N0)  around  In  2 
rises  dramatically  with  a  small  decrease  in  Ek/{Nj  +  jV0).  The  fraction  p  jammed 
gets  smaller  as  Ek/{Nj  +  N0)  gets  larger  (observe  however  that  it  does  not  get 
too  small).  This  indicates  that  the  jammer  is  wilfully  reducing  the  probability  of 
affecting  a  transmission  in  order  that  he  may  cause  more  serious  damage  when  he 
does  affect  a  transmission.  This  observation  suggests  that  simple  coding  such  as 
diversity  in  such  a  case  may  be  very  effective. 

Diversity  over  the  partial- band  interference  channel  was  next  investigated  using, 
at  first,  majority  logic  decoding.  In  this  scheme,  since  the  jammer  is  willing  to 
accept  a  small  probability  of  affecting  transmissions,  we  expect  that  the  majority 
of  the  receiver!  diversity  transmissions  would  be  received  error  free.  However,  the 
worst -rase  jammer  optimizes  his  p  with  respect  to  the  diversity  level  and  he  is 
able  to  ensure  that  even  for  very  large  f\/(.V7  +  .Y„)  the  asymptotic  symbol  error 
probability  is  1  Reliable  communication  in  this  case  is  possible  onlv  if  f.\/(  Xj .Y„ 
increases  with  diversity  level  faster  than  a  certain  rate. 
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Our  next  diversity  scheme  was  linear  combining  wherein  we  simply  added  the 
outputs  of  each  diversity  transmission.  The  hope  is  that  the  few  diversity  trans¬ 
missions  that  are  jammed  are  nullified  in  the  sum  statistic  by  the  many  good 
receptions.  In  this  case  the  jammer  can,  by  an  appropriate  choice  of  pi  ensure 
that  for  any  Eb/(Nj  +  iV0)  the  asymptotic  error  probability  is  non- zero.  The  jam¬ 
mer’s  choice  of  pi  is  inversely  proportional  to  the  diversity  level  and  thus  for  large 
L  the  jammer  is  jamming  a  very  small  fraction  but  with  a  large  power.  The  effect 
hoped  for  i.e.  the  swamping  out  of  the  few  bad  receptions  by  the  many  good  ones 
does  take  place  but  only  if  the  jammer’s  choice  of  pL  goes  to  zero  much  faster. 

All  this  suggests  that  in  our  diversity  combining  we  must  find  a  way  of  limiting 
the  contribution  of  any  individual  diversity  transmission  to  the  overall  decision 
statistic.  We  therefore  proceed  with  clipped  linear  combining  wherein  we  first 
clip  the  output  of  each  diversity  transmission  and  then  simply  add.  The  output 
statistics  are  thus  the  sums  of  many  i.i.d.  random  variables  which  suggests  the  use 
of  some  form  of  Central  Limit  Theorem  Approximations.  To  ensure  the  threshold 
behaviour  that  we  are  looking  for  in  the  error  probability  we  need  to  use  powerful 
versions  of  the  Central  Limit  Theorem  for  which  we  need  the  clipping  level,  the 
diversity  level  and  the  number  of  signals  to  satisfy  a  certain  relation.  Doing  so 
we  can  show  that  the  worst-case  jammer  can  be  neutralized  asymptotically,  i.e. 
he  is  seen  to  be  no  more  detrimental  to  reliable  communication  than  the  AWGN 
channel  of  equivalent  noise  spectral  density. 


CHAPTER  IV 


CONCLUSIONS 


In  Chapter  2  we  have  constructed  fairly  general  channel  models  which  are  capable 
of  representing  a  number  of  jamming  situations.  All  our  analysis  was  done  in  a 
game-theoretic  framework.  We  view  the  entire  transmitted  sequence  as  one  play  of 
a  zero-sum  two  person  game.  In  the  case  with  no  side  information  (Case  I)  we  have 
characterized  the  worst-case  jammer  strategy  by  the  number  of  points  of  support 
of  the  worst-case  distribution  as  well  as  by  necessary  and  sufficient  conditions  at 
these  points.  This  allows  us  to  formulate  the  search  for  the  worst  case  jammer 
strategy  as  a  finite  dimensional  nonlinear  programming  problem.  Although  the 
necessary  and  sufficient  conditions  are  not  easy  to  solve  for,  they  are  fairly  easy 
to  test.  Given  the  convexities  of  the  objective  functions  this  suggests  that  it 
would  be  possible  to  develop  efficient  steepest  ascent  (or  descent)  computational 
algorithms  for  these  optimization  problems.  Much  the  same  held  true  for  both  our 
performance  measures,  mutual  information  and  channel  cutoff  rate. 

In  the  cases  with  the  decoder  informed  we  reduce  the  communicator’s  strategy 
set  (either  by  using  the  “compatibility”  assumption  or  by  fixing  a  quantizer).  In 
this  case  when  we  have  convexity  with  respect  to  the  jammer's  strategy  (as  in  cases 
All  and  BII)  we  are  able  to  demonstrate  the  existence  of  a  saddle-point  strategy. 
For  the  case  with  non-randomized  quantization  we  are  further  able  to  characterize 
these  saddle-point  strategies  using  the  earlier  theory.  Part  of  the  reason  we  get 
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saddle  point  strategies  in  Case  II  and  not  in  Case  I  is  that  the  randomization  of  the 
quantizers  in  Case  I  does  not  average  out  the  objective  function.  In  Case  II  where 
the  decoder  is  informed  of  the  actual  quantizer  chosen  the  objective  function  does 
get  averaged  out  and  thus  we  do  actually  use  randomized  strategies  in  this  case. 

Although  our  analysis  was  done  mainly  for  non-adaptive  jammers  we  find  that 
a  number  of  our  results  hold  true  for  the  case  of  adaptive  jamming  as  well.  An 
appropriate  model  to  use  in  this  case  was  seen  to  be  the  arbitrary  ■‘star"  varying 
channel.  Despite  the  increase  in  the  jammer's  strategy  set  we  find  that  in  a  number 
of  cases  he  is  able  to  cause  no  more  loss  than  if  he  were  non-adaptive.  Other  gener¬ 
alizations  of  this  game  are  possible,  i.e.if  we  allow  the  communicator  to  chanee  his 
strategy  after  every  transmission  based  on  feedback  from  the  previous  transmission 
or  based  on  observing  the  jamming  noise.  These  would  be  sequential  games  with 
exchange  of  information  and  it  is  not  clear  what  kind  of  objective  function  would 
have  operational  significance  in  this  case.  While  we  have  an  upper  bound  on  the 
number  of  points  of  support  of  the  worst  case  distributions  it  is  possible  ’hat  the 
number  actually  needed  is  less.  Also  it  is  possible  that  the  performance  ,s  rot. us; 
with  regard  to  the  number  of  jamming  levels  and  that  a  jammer  with  a  few  ieveis  > 
able  to  do  fairly  well.  Such  questions  can  best  be  answered  Sv  numerical  ana.v*;- 

In  Chapter  3  we  have  invest  igated  the  asymptotic  performance  ,»t  .r  :i<>gc»r.u 
signals  over  channels  with  both  thermal  noise  as  well  as  unknown  par’ia.  ’>ano. 
interference.  The  worst-case  part  ial-band  jammer  does  degrade  *  he  t>\ . 
performance  severely  but  he  needs  to  optimize  his  strategy  for  eacrt  \ a.ue  •’  ’  • 
bit  energy  to  noise  ratio  i /i\  \j  -  .Y„ . .  chosen  b\  *he  omnium,  .r-  >r  ■  *-..v 

reveals  that  for  bit  *nergy  to  noise  ratios  above  a  :  J  •  ue  \r  ■■  c 

be  most  effective  if  he  jams  .*nlv  a  frav  on  .<  •:  •  . 

Rki' 1  \  j  —  ,V>i  ?he  worst  ,  use  amnuug  -s  •••»••  •  <  ’•  A 

equivalent  noise  spec?  rai  den<i!v  I  ;;e  ma. ,  ,  •  ■ 
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band  jamming:  getting  the  communication  system  to  perform  around  the  sharp 
rise  in  the  probability  of  error  curve.  To  do  this  however,  the  jammer  must  allow 
a  high  probability  of  not  jamming  a  particular  transmission.  This  suggests  coding 
by  way  of  diversity  as  a  means  of  overcoming  the  jammer. 

The  worst-case  jammer  tries  to  counteract  such  coding  by  making  a  few  trans¬ 
missions  affect  the  resultant  decision  statistic  significantly.  Both  majority  logic 
decoding  and  linear  combining  do  not  perform  well  against  such  a  jammer.  How¬ 
ever,  by  limiting  the  effect  of  a  single  transmission  on  the  decision  statistic,  clipped 
linear  combining  is  able  to  asymptotically  neutralize  partial-band  jamming. 

Our  analysis  in  Chapter  3  was  entirely  asymptotic.  Other  interesting  questions 
that  could  be  asked  are;  how  should  diversity  be  chosen  as  a  function  of  the  number 
of  signals  to  achieve  a  given  probability  of  error  against  the  worst-  case  jammer  ? 
If  the  jammer  has  a  peak  power  constraint  then  what  values  of  diversity  and  signal 
set  size  will  achieve  a  given  error  probability  against  the  worst-case  jammer  ? 

Although  we  did  our  analysis  using  coherent  detection  our  results  are  valid  for 
noncoherent  detection  in  all  cases  except  for  the  clipped  linear  combining  case. 
I  his  is  because  the  properties  we  use  of  the  probability  of  error  of  M  orthogonal 
'lenals  with  coherent  detection  remain  valid  even  in  the  noncoherent  case.  These 
properties  are  the  monotone  decreasing  nature  of  the  probability  of  error  with  the 
ut  energy  to  noise  ratio  and  the  asymptotic  threshold  behaviour  of  the  probability 
't  error  However  for  the  clipped  linear  combining  case,  the  Gaussian  approxima- 
'."Us  w  used  in  the  coherent  case,  do  not  necessarily  work.  It  would  be  interesting 
•  '  determine  ;t  there  is  a  diversity  combining  scheme  in  the  noncoherent  case  which 
•■»:.<!  neu'ra.i/e  "he  partial  band  jammer. 
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APPENDIX  A 


Consider,  the  following  metric  on  the  space  of  £>-dimensional  distributions  on 


d(F,G )  =  in{{h  :  F(xi  —  h,Xi  —  h, . . .  ,xq  —  h)  —  h,  <  G(xx, . . .  ,xq) 

<  F(xi  +  h,...,xo  4-  h)  +  h  forall(ii, . . .  ,X£))}. 

We  check  that  d(F,  G)  satisfies  the  properties  of  a  metric: 

1. d{F,G)>0  and  =0  iff  F  =  G  . 

2.  d{F,G)  =  d(G,F). 

3.  d{F,H)  <d{F,G)  +  d{G,H)  . 

1.  Clearly,  d(F,G)  >  0.  If  d(F,  G)  =  0  we  consider  a  sequence  hn  j  0  and  from 
the  right-continuity  of  distribution  functions  and  the  definition  of  d  we  get 

G(x  t,...,xD)  <  F(xt, ... ,  xd)- 

Similarly 

F(x i,...,xD)  <  G{x i,...,xD) 

F  =  G 


2.  Let  d{F,G)  =  d. 


Then  for  all  h  >  d  and  V(x[, . . . ,  xp) 

F{xx  -  k,. ..  ,xD  ~  h)  -  h  <  G(xi, - xD)  . 


Similarly  F(xx,  >  G(xx  —  h, . . .  ,xq  —  h)  —  h. 

d(F,G)  =  d(G,F)  . 

3)  Let  d(F,  G)  =  dXl  d{G ,  H )  =  d2 ,  d(F,  tf)  =  d3- 
Then  for  /ix  >  dx,  h2  >  d2  and  V(xx, . . . ,  X£>) 

F(xx  —  Aj, . . . ,  X£>  —  Ax)  ~  hi  <  G(xx, . .  • ,  X£>)  <  F(xx  +  Ax, . . . ,  ip  +  h i)  +  hi 
and 

G(xx  —  h2, . . . ,  xd  —  /12)  —  h2  <  H(x  1, . . . ,  i£»)  <  G(xx  +  /12, . . . ,  xo  +  h2)  +  h2 
F(x  1  —  hi  —  h2, . . .  ,xd  —  hi  —  h2)  —  hi  —  h2  <  H(xi, . . . ,  x#) 

<  F(xx  +  /&i  +  /l2,  -  •  •  ,  Xjj  -f  /ix  +  A2)  +  Ax  +  ^2  ■ 
d$  <  dx  -f  <^2- 
<f(F,  G)  is  a  metric. 

A  sequence  of  distribution  functions  Fn  on  R°  is  said  to  converge  weakly  to  F 
iff  for  any  bounded  continuous  function  f(x.)  defined  on  RD  (where  £  is  (xx, . . . ,  xD) 

JRDKx.)dFM  —>  JRD  Hx.)dF.(x.) 
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Lemma:  With  F  ,  Fn  denoting  the  distribution  functions  of  random  vari¬ 

ables  X.(—  (Alt  X?, . . .  ,Xd))  such  that  f,  <  A,  <  u,  the  following  are  equivalent: 

1.  Fn  — ►  F  at  every  point  £  which  is  a  continuity  point  of  the  distribution 

F(3L). 

2.  d(Fn,F)—>  0. 

3.  Fn  -HU  F  . 

Proof:  The  proof  is  accomplished  by  showing  the  following:  1  =>•  2, 1  =>  3, 2  =► 

1,  3  =>  1. 

i)  1  =►  2.  Let  C  denote  the  set  of  continuity  points  of  F(x.).  Cleaxly  C  is 
dense  in  RD.  Choose  au  a2, . . . ,  aD  (G  C)  such  that  a,  <  f,-,  i  = 
and  6i,  b2, . . . ,  &£>(e  C)  such  that  &,■  =  u,  +  1.  Subdivide  each  [a,-,  6,]  by  points 
a.  =  a.o  <  a»,i  <  a«, 2  <  . . .  <  a,-,a  =  6j  a,t*  €  C  such  that  a,-,*  —  <  e.  Let 

E  =  {(xi, . . .  ,xm)  :  ai  <n  <  &<}.  Clearly  Pr(E )  =  1. 

Let  L  denote  the  lattice  of  points  with  ith  coordinate  equal  to  0  <  k  <  s.  L 
has  (3  +  1)D  points.  Denote  a  generic  point  of  L  by  l.  Given  any  e  >  0  choose 
N  large  enough  so  that  for  n  >  N  at  all  points  l  (z  L  the  following  inequality  is 
satisfied: 

I  Fn(l)  -  F(l)  |< 


81 


& 


$ 

ml 


ft 

ft 
*  >« 

dr 

vi;! 

'••■*»!*! 


Now  we  prove  that  for  every  £  and  for  n  >  N 


F(x i  -  e, . . .  ,xD  -  t)  -  t  <  Fn(xl5 . . .  iD)  <  F(xx  +  e,  ■ .  •  xD  +  e)  +  € 
a)  Consider  £  €  F  . 

Then  £  hes  in  one  of  the  lattice  cells.  Let  /  denote  the  closest  lattice  point  such 
that  /  >  £  and  let  l  denote  the  closest  lattice  point  such  that  /  <  £.  Clearly 
7  <  (xx  +  e, . . .  ,id  +  e)  and  i  >  (x i  xD  -  e). 


Fn(£)  <  Fn(7)  <  F(7)  + 1  <  F(x!  +  ci . . . ,  xD  +  e)  +  |.e 


Hence 


Fn(£)  >  Fn(J)  >  Ftf)  -  |  >  F(xx  -  «, . . . ,  id  -  «)  -  (B.2) 


F(xi  -  e, . . . ,  xD  -  e)  -  |  <  Fn(x)  <  F(xx  +  e, . . . ,  xD  +  e)  + 

F(xx  -  e, . . . , xd  -  e)  -  e  <  F„(x)  <  F(xx  +  e,...,iD  +  e)  +  «. 

b)  Consider  now  £  g  E. 

We  examine  the  two  cases. 

i)  I  3  either  x,-  >  a,-  for  all  i  or  x<  <  6j  for  all  i.  Call  the  set  of  all  such  £,  W. 

ii)  £  W.  By  our  selection  of  E  for  such  £,  F{s)  =  0. 

Case  i)  When  £  6  W  and  x,  <  for  all  i  then  by  our  selection  of  F,  F(z)  = 


0,  Fn(£)  =  0.  Hence 


Fn(£)  <  F{x)  +  e 
En(x)  >  F(x)  -  t 


F(x,  -  e . xD  -  c)  -  e  <  F„(£)  <  F(xx  +  e - -  xD  +  c)  +  e. 


When  £  €  W  and  x,  >  a,  for  all  i  define  the  following  sets: 


Vi.  d  o’ 


M 

;  , 


o 

V 


t  ={£:  a,  <  x,  <  6,  i  =  1 . D} 

E2  =  E  -  E1. 

Define  /*  as  follows:  consider  all  the  components  of  £,  xM  , .  .  .  xI)f  which  are  greater 
than  6,, , .  .  .  ,  6,^  respectively.  Call  the  set  of  such  indices  Q.  Keeping  the  compo¬ 
nents  with  indices  ^  Q  as  before,  reduce  the  components  with  indices  in  Q  to 
x*  .  x*?  so  that  b,k  <  x*t  <  u,k ,  b,k  <  x*t  —  t  <  u,k  and  and  b,h  <  x*t  +  t  <  u,k 
with  I/,  G  Q .  This  vector  we  call  /’  (it  is  clear  that  such  a  /*  can  always  be 

found).  By  construction  F( i)  =  F([m ),  Fn(x)  =  En(D,  F(xi  -  c, . to  -  t)  = 

F(/*  -  e, - I'd  ~  €)  an^  ^(xi  +  e, . . .  ,xp  +  e)  =  E(/j  +  e, - l'D  +  c).  Also,  (’ 

clearly  belongs  to  E.  Therefore  this  case  is  reduced  to  the  case  when  £  G  E. 
Hence  by  the  argument  in  part  (a) 


F{xl  -  .  .  ,XD  -  t)  -  €  <  Fn(x)  <  F{xy  +t,...,XD  +  €.)  +  €. 


Case  ii)  When  x.  3  W  let  /  denote  the  lattice  point  closest  to  £. 


Then 


Hence 


Fn(x.)  =  0,  F(x.)  =  0,  F(l)  =  0. 
Fn{z)  <  F(jl)  +  e 
Fn(x.)  >  F(x)  >  e. 


E(xt  -e,...,xD-e)-e<  Fn(x)  <  F(x  i  +  e, . . . ,  xD  +  e)  +  e. 


Thus  1  =>  2. 
ii)  1^-3. 

Take  any  function  f(x .)  bounded  and  continuous  on  E.  Since  E  is  compact  /  is 
uniformly  continuous  on  E.  Denote  by  U  an  upper  bound  of  |/(£.)|  and  choose 


points 


^  Uj’,1  ^  ^  -  by 


a,i'k  G  C 
i  = 


i  I**  I  .  (  ?  I  ft,  I 


so  that  we  have  a  lattice  /  on  E  sn<  h  that  /'/ (i  j'ln  ■  '  wlierc  /i  and  C 

are  points  belonging  to  the  same  lattice  i  ell  Construct  the  fun<  turn  I,  wlmh  is 
constant  on  the  lattice  cells  as  follows 

/,(£)  =  /(uO  £.  6  E  and  is  some  interior 

point  of  the  lattice  cell  j  to  which  £  belongs 

=  0  x  £  £’ 


Obviously  for  any  distribution  function  (o(i) 


f  f<U)dGU)  =  '£.f(yi)&a,  a,  ...A,  G(x,,.. 

J  “  *—  V*,  •aI.*,  +  l  O  kD  aD.»D  +  l 

where  af  A(,  af  ki+1  are  the  I-coordinates  of  £■*  and  where 


•  •  *d) 


Aij ,ol , . . .  ,AkD<aDF(xi, . . .  xp)  =  F0  —  Fi  +  F2, . . .  +  (  —  1)dFd 

Fj  is  the  sum  of  all  ( terms  of  the  form  F(ci, . . . ,  cd)  with  Ck  =  for  exactly  i 
integers  in  {1, . . . ,  D}  and  c*  =  bk  for  the  remaining  D  —  i  integers. 

Since  Fn(s_)  — *■  F(s_)  at  the  lattice  points 


/  f.(l)dF„(x)  -  J  f,(x.)dF(x). 


Also 


/i/(je)  -  /.u™ = jf„  1/(1)  -  «« + jE i/ot)  -/.u)  iitfui 

<  0  +  /  €  dJFXi)  =  e. 

JE 


Similarly 


/  !/(x)  -  /.(aOlif.U)  < 


Hence 


/  f(x.)dFnU)  -  J  f(x)dF(x.)  |  <  3  e 


M 


,  v 
V 

.V 


t<»r  ottfii  ienti\  lari'*-  'i  ''imt'  t  was  arbitrate  w**  has**  i  ->  f 


Let  ^  be  .  ontimntv  point  of  /■  ( x. 1  I' hen  for  everv  r  >  I)  there  exists  ^  >  Ij  sut  h 


I  ^’(i)  ~  l<  * 

if  -X  in  I  1  where  |x|j  -  y  if  +  +  if,  Let  k  =  and  let  n  he 

suffit  lentlv  large  so  that  d(Fn,F)  <  h  Then 

/■'a(X-j)  >  F{ xtu  -  fi - Xq.d  -  h)  -  h  >  F(io)  -  2t 

/■’a(Xo)  <  FIxoa  +  ■  ■  ■  ■.xqjj  +  h)  +  h  <  F(jU))  +  2t. 

Since  c  is  arbitrary  ‘2  1  . 

iv)  3  =>  1 

Let  £o  be  a  continuity  point  of  F(z)  and  let  Fn  F  .  Take  6  >  0  such  that  for 
Hi  ~  loll  <  '/D6  |F(i)  —  F(iy)  |  <  e  .  Define  the  following  sets: 

J  =  {x-  x,  <  x0,„  t=l,...,£>}. 

Js  =  {x.-  x,  <  x0,i  —  8,  t  =  l,...,Z)}. 

J6  =  {i  :  x,  <  x0l,  +  6,  i  =  1, . . . ,  £>}. 

Ji  =  J-  Jff. 


J2  =  J-  J. 


Construct  the  functions: 


/i(i)  =  1  i  G  J# 


1  D 

X]  I°.«  “  ^  ~  max(x0,i  -  6,  x,) 


x  6  Ji 


=  0  elsewhere. 


•  ■Wv  •• 


*  v •  v  v  y, ,  ■  p 

‘  > ■  « *  *  *\  ■ 


*T-  v  v  y  r 


'  T  "if  'll  S.M  .* 


1 


u 


V^Xo.,  -  maxi  t0.,,  x,  j  x  t  J2 


Dd 

t 

-  0  elsewhere 


/iixl  and  }2\x\  are  both  continuous  functions  ranging  between  0  and  1.  On  J,  , 

/ 1 1  x )  <  1  anti  tin  J,,  f2( x  l  <  1 

Then 


j  fi(L)dF(£)  >  j 

1  dF(x.)  =  F(x0. i  . x0 ,o  -  <0  >  F(z 0)  -  t 

J, 

( B.3) 

J  /a(£)dF(*)  <  J 

ldf'(i)  =  F(x0,i  +  6 . x0,D  +  6)  <  F(£o)  +  t 

j • 

(B.4) 

J 

(  /i(x)dFn(x)  <  1  dFn(L)  =  Fn(j^) 

( B.5) 

J 

f  hU)dFnU)  >  1  dFn(i)  =  Fn(jJ. 

(B.6) 

From  the  fact  that  Fn 

F  for  sufficiently  large  n 

1  J  /i(i)dFn(i)  -  y*  /i(x)  </F(i)|  <  e 

(B.7) 

1  /  /aUWU)  -  / /3U)  rfF(*)|  <  e. 

(B.8) 

From  (B.3),(B.4),(B.5),(B.6),  (B.7)  and  (B.8) 

F(x. o)  -  2e  <  Fn{x. o)  <  F(xjo)  +  2e 

Since  e  was  arbitrary  3  =►  1  .  This  completes  the  proof  of  the  proposition 
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Lemma:  The  set  S  of  distribution  functions  of  random  variables  £  =  (xi, . . . ,  xp) 

such  that  0  <  x,  <  6,  is  compact  in  the  space  of  distribution  functions  on  £. 
Proof:  Let  a  sequence  of  distribution  functions  Fn(£_)  be  given.  Pick  a  count¬ 

able  set  C  everywhere  dense  on  the  set  RD,  C  =  {ii,.. By  Hellv's 
Weak  Compactness  Theorem  [Loeve  77,  pg.  181]  there  exists  a  subsequence 
Fn,  (i),  •  • . ,  Fnk(x), . . .  which  converges  at  every  point  £  =  £,.  Let 

»(*•)  =  Jim  Fnk(x^) 

k—*oo 

and  set 

F(x)  =  sup 

x.<i 

The  function  F(x_)  is  defined  everywhere  on  R°  and  is  obviously  non-decreasing 
and  right-continuous.  Clearly  F(x.)  =  1  for  >  6,  and  F(ii, . . . ,  c, . . . ,  xp)  = 

0  if  c  <  0.  Thus  F{x)  is  a  distribution  function  on  R°  and  F  €  S. 

Also,  it  is  easy  to  see  that  Fnh{x)  converges  to  F{x)  at  every  continuity  point  of 
F(£,).  This  is  equivalent  to  weak  convergence  of  FnktoF  [Ash  72,  Th  4.5.1]  which 
by  Lemma  1  is  equivalent  to  convergence  in  the  Levy  metric.  Hence  d(Fnk,F)  —*  0. 

Hence  any  sequence  of  points  belonging  to  S  has  a  convergent  subsequence  in 
S.  Since  the  space  of  all  distribution  functions  is  a  metric  space  (with  the  Levy 
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Lemma  3  :  /£,(<?;  F2)  =  f  i(z;G,  Fl)dFJ(z)  -  I(G\FX) 

n  v  \  _  r'  —/  i _  ( _ 1  P(y\xi z )dF\ 


where  i(z\G,Fx)  =  £*,„  p(x)p(y|x,  2)log 


Proof  of  Lemma  3 


52p(x)I  p(y 


I'Fi{G;F2)  =  lim-  j]Tp(x)  (/ I  P(v\x,  *.0)[(1  -  ot)dFx  +  adF2]dG(9)) 

ai  a  l  r,V 

L  (// P(y|x,*,  *)[(!  -  a)dFx  +  adF2]dG(9)) 
£?(*)(/ Jp(y|*,M)[(l  ~  <*)dFx  +  adF2)dG(9) 

X 

-£?(*)(/ /  p{y\x,x,0)dFxdG(9)). 

*,V 

,  (//  . 

22p(x)U  f  p(y\X'Z,0)dF-,dG(0) 

X  , 

Denoting  f  p(y\x,z,9)dG(9)  asp(pjx,z) 

I'f^G'iFz)  =  lim  -  {X)p(*)[/p(y|*,«)[(l  -  a)dFi  +  adF2) 

0i  a  V,  x,y 

,  /p(p|x,2)[(l  -  a)dFi  +  a  dF2] 


£p(x)  /  p(yl*i  ^JK1  -  aW  +  adF2] 


'  ^p(x)/p(p|x,xW 


/p(p|x,z)[(l  -  ot)dF !  +  cxdF2 _ 

p(x)/p(y|x,  ■*)[(!  -  a)dFx  +  a  <fF2] 


YjP(x)  J p(y\x,z)adF2\og 
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[  ,  ,  ,  ,P,  Jp{y\x,z)[{\  -  a)dFx  + adF-i] 

-  p{y\x,z)adFl log  -  —  -  - — - - — — 

'  2^P(-c)/p(yNi2)[(l  -  (*)dFi  +  adF2] 


/p(y|z,z)[(l  -  a)dFj  +  adF?] 
P(a;)/p(y|;c>2)[(1  -  a)dFi  +  adF2\ 


f  p{y\x,  z)dFi 


+  ^™-'52p(x)  [  p(y\x,z)dF1\og  =r— 

ai0aTt  J  2l.p( 


-  J  p(y\x,z)dF1\og 


'Z,p(x)fp(y\x,z)dF1 


=  a  +  6(say). 


By  choosing  a  sequence  an  J,  0  and  using  weak  convergence  of  (1  —  Qn)dF1  + 
andF2  to  dFi 


a  =  Ji(z;G,Fl)dF2-I(G;F1 ) 


&=f  Lp(-)/  p(y|*,*Wlog 

da  J  2^p(x)jp(y\x’ 


I p{y\xi  z)[(l  -  a)dFi  +  adF2 ] 


,z)[(l  -a)dF,  +  adF2 


Taking  the  derivative 


f  ^p(x)/p(y|x',2)[(i  -  a)dFi  +  adF2] 

6  =  TjM }tW. * W  ‘  + 

* 

/p(yl*»^)[(l-aW+a^a]^  J p(y\x',z){dF2  -  dFi) 
-  J  P(y\x',z)[(l  -a)dFi  +  adF2]  P(x)  J  P(y\x,z){dF2  -  dFx)\  | 


where  d  =  ^p(z)/p(y|z,  z)[(l  —  or)dFi  +  adF2. 

X 

After  some  algebraic  manipulation  it  can  be  shown  that  b  — *  0  as  a  J.  0. 
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Here  we  consider  a  ..  ommumcat  sou  <ame  *  '  t  '  .1  -•  •  - 

an  input  distribution  ’■  on  trie-  M  ar%  up..'  .t.nn.U'e  i.  »  * 

the  M  \  l.  transition  prooabiht'.  matrx  .  <••■  V  i 
output  random  variables  respective;',  aim  .et  .  •. 

random  variable  associated  with  the  coim;t .ot.a  ■  -  -- 

feasible  a'*  i  =  ,rli-  '*  u  be  compact  I  he  name  :<  -  i  •  . 

[ni . n.vf))-  Assume  this  function  is  linear  .uni  to:  i  - 

1 . \/  the  channel  chosen  is  symmetric  l.e'  !  ^  ~ \  N  •  »•;  v.  . 

is  r  and  B’s  choice  is  rj,.  Let  n,.  .  n  w  be  .  oust  r.\. tie. :  *  •  ■  •, 

i  =  1 . c  where  /,  is  a  convex,  symmetric  tunc' ion  ot  ■’  ••  w 

invariant  under  any  permutation  of  U],  .fi\i  I"  hen  a  saddle  point  orate*;.  e\>~ 
for  both  players  and  for  player  A  it  is  to  chooose  a  uniform  distribution  >t:  •  to- 
input  and  for  player  B  it  is  to  choose  all  the  components  of  n  equal.  e  •her*' 
exists  nl  with  all  its  components  equal  such  that 

/(r,n 1)  <  I(r'.nl)  <  7(r\rj.) 

where  r*  corresponds  to  the  uniform  input  distribution. 

Proof:  Step  1:  7(r,a*)  <  I(rm, &*) 

This  follows  from  the  fact  that  the  mutual  information  between  the  input  and  the 
output  of  a  symmetric  channnel  is  maximized  by  the  uniform  distribution. 

Step  2:  7(r*,a*)  <  7(r*,a) 


a  men.  :$  linear  :n  u.-  I  r-H!  is  convex 


•  :>•  ra.u: ■>  : >et  ot  teas: Die  n  s  is  a  convex 


ii.  at  some  u , 


Then  we 


•  nc  '  a  '  :ie  minimum  is  a*  so  achieved  at  rj.".  The 
•  :.e  and  ’tie  -symmetry  of  the  constraints 

/a  :.  :  >a>  ae  nave  a  new  channel  p*\y  x I 

•.  rx  >:  •  :se  ’f  * :.e  >r.eir.ai  channel.  The  mutual 

a  •  \  'A  ..c.sider  all  the  A/l  permutations 

■  a  >:>  ir-‘  not  ustinct  hut  it  does  not  matter). 


...  •  n  '  =  -a1-  F very  component  of  is 

•  --v  •:  1  •:  w  r.t.  u  we  know  that 


_  V'  .. 

v;  — .  ^ 


irS' 


;e\,v4  At  r  * y *« v  !‘he  result  then  follow?  from  the 


*:;•  avr  m  - 


.  i<  ydv:  >!*>->>>• 


'V.1'  u,«V.WF.*y*a| 


APPENDIX  F 


Lemma  4  :  D*  (T.(G,  P,Fa))  =  E 

\  y  \  2\/p(y\x2,z)dF1 


\/fp{y  1*2, /p(y|*i,  z)dF2\ \  m/^ 

JJ  0(,l)' 

Proof  of  Lemma  4: 


Z?Fl(To(G,P,F2))  =  hm±  [i?  (e  \// Pfo  l*i.  *)(«^a  +  (1  -  *)dFx) 

\j J P(y\x2,z){adF2  +  (1  -  a)<fFi)^  -  £  ^  j p{y\x1,z)dF1  J p{y\x2,  z)dF^j 

=  la{^  (ll  ]j  J  P(y\xuz)(adF2  +  (1  -  a)dFi)^J J  p(y\x2lz)(adF2  +  (1  -  o^dFx) 


at  a  =  0  . 


Using  the  Dominated  Convergence  Theorem  we  have 

•(=(?-)) 


=  e\Y. 


’Sp{y\x\,z)  (adFi  +  (1  -  a)dFi)  f  p{y\x2,z)(dF2  -  dFi) 


2V/P(y|*2>  z){adF2  +  (1  -  a)dFi) 


\fSp(y\x2,z)(adF2  +  (l  -  a)dFi)  S p{y\xx,z)(dF2  -  dFx)' 
2\/ f  p(y\xii  z)(adF2  +  (1  —  a)dF\) 


S p(y\xi,z)dFl  S p(y\x2,  z)(dF2  -  dFi) 
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APPENDIX  G 


Pe  (x,  M)  is  a  monotone  decreasing  function  of  x. 

i)  Coherent  detection: 

As  I  I,  Q  (z  +  y/2x  log  Af)  I 

1  roo  a  .  M—  1 

K  =  — /  e“  »  1  -  Q(z  +  ^2a:  log  Af  dz  | 

v27T  ■/ -oo  v 

P'(x,M)  =  1-K  | 

ii)  Non-coherent  detection: 

As  x  f,  IQ  ( y/xz )  e“^r  j 

••  pc{x,M)  =  jf  zl0(xz)  exp  Z  j  1-  n(l-  exp  dz  | 

t*AT 
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APPENDIX  H 


We  modify  a  result  due  to  S.  Kh.  Sirazhdinov  and  M.  Mamatov  [Sira  62]. 

Let  Xi , . . .  ,Xn  be  independent  random  variables  with  the  common  distribu¬ 


tion  .F(x).  Let  E(Xi)  =  0,  E(Xf)  =  1.  Let  Fn(x )  be  the  distribution  of 
X  •+■  +X 

- 7= — -  .  It  is  well-known  that  Fn(x)  can  be  represented  uniquely  as  a  sum 

vn 

of  the  form 


Fn(x )  —  An  Cn(x)  +  (1  —  An)  5n(x)  0  <  An  <  1  (H.l) 

where  Cn(x)  represents  the  absolutely  continuous  part  and  5„(x)  represents  the 
singular  and  step-function  parts  of  Fn(x) 

Let  pn(x)  be  the  density  of  the  absolutely  continuous  part. 

Theorem  1:  If  3  n0  such  that  >  0  and  if  a3  =  E(X?)  <  oo  then 
for  sufficiently  large  n 


/  I  Pn{x)  -  <j>  (x)  |  dx  =  f  (x3  -  3x)  4>  (x)dx  +  (H.2) 

•/— oo  «/— oo  oyn  >/n 

where 

«*)  =  (H.3) 

i.e.  the  density  of  the  standard  normal  distribution,  and  C\  is  some  universal 


constant 


T 


Proof  of  Theorem  1: 


-4-  ,  /  •  .\3  _-4 


gn(t)  =  e-V  +  ^  (it)3  e-t 


(H.4) 


where  t  =  \f— T . 

and  let  f(t)  and  /„(/)  =  (where  fn  (.)  indicates  the  n-fold  convolution) 

be  the  characteristic  functions  of  the  distributions  F(x)  and  Fn(x)  respectively. 


We  need  the  following  Lemmas. 


Lemma  1:  If  /?3  =  E  |  X{  j3  <  oo  and  1 1 1<  then  for 


i)  I  fn  ( t )  -  9n  (<)l  <  ^  (\t  |3  +  It  |6)  exp 


(-3 


ii)  l/n  (0  -  9'n  (t) I  <  ^  (I  <  I2  +  I  t  I7)  exp  f-j 


Proof  of  Lemma  1: 


i)  is  proved  in  [Gned  54]. 

ii)  is  proved  in  [Esse  58]. 

Lemma  2:  For  n0  as  in  the  hypotheses  of  Theorem  1, 


Fn o  W  =  P  Hx(x)  +  q  H2{x) 


(H.5) 


where  p  >  0,  q>  0,  p  +  q  =  1  and  Hx(.)  and  H2{.)  are  distribution  functions 
such  that  if  Ai(t)  is  the  characteristic  function  corresponding  to  Hx(x)  then: 

1)  Ho  I  hx  ( t )  |2  dt  <  oo 

2)  Given  any  t  >  0  there  exists  a  C4  >  0  such  that  the  inequality 

I  hx  ( t )  |  <  exp  (— C4)  holds  for  1 1  j  >  c. 


Proof  of  Lemma  2: 


See  (Prok  52]. 

Lemma  3:  If  p  >  0,  q  >  0,  p  +  q  =  1  then  for  sufficiently  large  n, 


E  (")?-?”<§ 

^  r-  1  Vm/  nl 

m-np<-i/n  log  n 


(H.6) 


5  .v'O V*HV , V£ 
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Proof  of  Lemma  3: 

Let  W  be  a  Binomial  ( n,p )  random  variable.  Then  L.H.S.  of  (H.6)  is  Pr{W  < 

Q 

np  —  y/n  log  n).  Hence  (H.6)  is  true  iff  Pr(W  >  np  —  y/n  log  n)  <  1  — 

n2 

Now  from  the  generalized  Cheybyshev  inequality  we  know  that 


Pr(w  >  np  —  y/n  log  n)  < 


E  (f(w)) 


f  {np-  y/n  log  n) 
where  /  is  any  increasing,  non-negative  function  on  its  range. 
Choosing  /  (W)  =  W 3  we  get 

E  (W3) 


Pr(W  >  np  -  vSlogn)  +  -p  _  ^  |<)g  „)3 

_ n3  p3  -  (3p3  -  p2)n2  +  (2 p3  -  p2  +  p)n _ 

n3  p3  +  3n2  p2  y/n  log  n  —  3n2  p  (log  n)2  —  n  y/n  (log  n)3 


(H.7) 


n2(-3  p3  +  p2  -  3 p2  y/n  log  n  -b  3p(log  n)2)  +  n(2p3  -p2  +  p  +  y/n  (log  n)3) 
n3jp  +  3n2p2  y/n  log  n  —  3n2  p(log  n)2  —  n  y/n  (log  n)3 


>  1  -  (H.8) 

n2 

Now  we  proceed  with  the  proof  of  Theorem  1. 

Let  n  =  n0m  +  r  0  <  r  <  no  •  Then  according  to  Lemma  2, 

=  t  (7)  <*») 

Here  *  denotes  convolution  and  the  exponents  of  the  distribution  functions  H 
and  F  denote  the  corresponding  numbers  of  convolutions. 

We  now  divide  the  sum  into  two  parts.  By  we  will  mean  the  sum  over 
those  values  of  j  for  which  j  —  mp  >  —y/m  log  m  and  by  £2  we  will  mean 
the  sum  over  those  values  of  j  for  which  j  —  mp  <  —yfm  log  m,  and  by  £  we 
will  mean  the  sum  over  ail  values  of  j  so  that 


r 
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Fn(x)  =  El  (7)  P>  qm~i  Hi  *  H?-J  *  Fr 

+  E  yfj  p* 9m~j  Hi * H?~j * pr 

The  distribution  H{  *  *  Fr  has  the  characteristic  function 


(H.10) 


and  for  j  >  2  it  haw  a  squaxe  integrable  density  which  we  denote  by  pmj(x) 
Let 


4>n  (x)  =  <£(x) 


1  + 


a3  /  3 


6  y/n 


(x3  —  3x) 


Then 


/oo 

I  Pn(x)  -  <f>n(x)  |  dx 
•oo 

s  /__!  E (™ j p’<r~i Pmj(x) - Mx) I dx 


+  E  (J 1  v  r- 


and  therefore  by  Lemma  3 


/  i  Pn(x)  -  (f>n{x)  |  dx  <  f  |  Y,  ~  <t>n{x)  \  dx  +  ^ 

•i—oo  J —oo  j  n* 

By  the  Cauchy-Schwarz  inequality  we  have 


(/”  I  ZTl  -  MX)  I  dx y  <  J2  I  E.  -  Mx)  P  0  +* ’)ix  jr 


oo 

oo 


dx 


=  *■  r  IE-  <M*)  I2  ^  +  r  X2  I  El  -  <M= 

y — oo  2  */-oo 

=  *■[/  +  7i]  say 


I2  dx 


1  +  x2 

(H-ll) 

(H.12) 

(H.13) 
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We  evaluate  I  and  I\  separately.  Since  £  1  is  square- integrable  we  have 
from  Parseval’s  Theorem. 

/  =  r  i  Ei  -  m*)  i2  dx 

J  — oo 

=  /_”  l  E,  (")  P'V-iM(V? )  (‘V? )  (75)  - »"(')  I1  * 


<  /  _  I  Ei  —  s.(t)  I2  *  +  2  / 

Jt>  As/n 


OAV’n 


Ei  I2  dt 


+  2  /  |  «„(()  |2 

Jt>Ay/n 


dt 


(H.14) 


where  A  < 


1 


24& 

From  Lemmas  1  and  3 


/  _  I  El  —  J.(t)  I2  *  =  /|,|<a.V5  I  E  -  Sn(t)  -  El  I2  * 

/„(()-,.«)  I2*  +  2/  IEI2  * 


/|t|<A>/n 
<  2  / 


|t|<A>/n 


<  ^ 


n 


(H.15) 


Also 


[  I  £  I2  dt 

-  itolWS  1  *•  (‘\/t  ) 


lmpdt 


<  f  \hl(z)rdz 

V  n  J t*|>AVnT 


<  /Z  c-^(mP-2)  r°°  |  Ai(z)  |2  ^  <  ^9  (Htl6) 

y  Hq  J —oo  n 


since  j  >  >  1  in  the  sum  (f°r  sufficiently  large  values  of  m) 

2 

Also 


f  |  Jn(i)|2  *  <  V 

./ If  >Ax/n  n 


/|<|>A,/n 


(H.17) 
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It  follows  from  (H.15),  (H.16)  and  (H.17)  that 


k-;- 


i  < 


-^11 

71 


(H.18) 


We  now  estimate  I\. 


dx 


h  =  /  *2  I  2  ”  ^»(x 

J—OO  i 

=  fjE  (™)  ^  <r-j  ti  hr3  n  -  i2  dt 

=  i[  +  i'2  +  r3 


(H.19) 


and 


=  1<^  KD'-V.(<)  +  (D'  -  (D' 


dt 


<  2 


L 


i  m  -  m  i2  a  +  2  f  i  ©' ia* 

-'|t|<A'v/n  j 


^12 

n 


(H.20) 


from  Lemmas  1  and  3. 
From  Lemma  2  we  have 


Also 


/j{|>AVn 

Thus  from  (H.18)  and  (H.23)  we  get 


r,  =  2  j  |  (£)'  i1  a  <  ^ 

73  =  f  .  I  K(t)  I2  dt  < 


(H.21) 


(H.22) 


/i  <  — 

n 


(H.23) 


From  (H.20)  ,  (H.21)  and  (H.22)  we  get 

/  I  Pn(*)  -  <M*)  I  dx  <  -£= 

J — oo  y  71 


(H.24) 
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Hence 


/oo 

I  pn(x)  -  <t>{x)  I  lx 

-oo 

<t>n{x)  ~  4>{x)  \  dx  +  [  I  Pn(x)  -  <f> „(x)  |  dx 

OO 


(i^  —  3x)  dx  + 

6v/n  v  v/n 


(H.25) 

(H.26) 

(H.27) 
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APPENDIX  I 


Var  (YLj)  < 


No  Nj 
2  +  2 


Proof: 

Denote  Var  (Ylj)  as  V  (Ylj) 
Then 


V(YLj)  =  V(YLJ\Z]tl  =  0)  Pr(Zjt(]a0) 

+V[YLj\ Zjj  ?  0)  Pr(Zjj  ?  0) 
=  (1  -  PL)V(CL(Nhl  +  VE)) 


YPl  V({cL[~,  nu  +  Nuy/E^ 


<  (1  -Pl)  V  (Njj  +  VE) 


+  PL  V 


ni,l  +  Njj  +  Ve 


=  (1  -Pl)V(Njj)  +  pl  V  + 

~  n  ^  N°  .  NJ  ^  NoL 
-(1-Pl)—  Y  —  +  pl~ 

=  No  Nj_ 

2  +  2 


N. 


102 


APPENDIX  J 


We  show  here  that  /?l  =  — *■  1  by  showing  that  ^  -»  1.  Now  = 

— — -y  L'1^ .  Hence  it  suffices  to  show  that  for  the  worst  case  jammer  Var(X[,j)  —* 
Nj/2  +  No/2  and  Var(Yu  — >  Nj/2  +  Nq/2.  From  Appendix  I  we  know  that 
Var(XLj)  <  Nj/2  +  N0/2  and  Var(YLti)  <  Nj/2  +  N0/2  .  Now  by  simply  choosing 
the  sequence  {pL=l}f5itis  dear  from  Appendix  I  that  the  jammer  can  achieve 
Var(XLj)  —*  Nj/2+  N0/2  and  Var(Yij)  — ►  Nj/2  +  No/2  .  Thus  clearly  the  worst- 
case  jammer  can  achieve  Var(XL,i )  -*■  Nj/2  +  N0/2  and  Var{YLj)  -+  Nj/2  +  N0/2 

We  also  show  here  that  E(YLti)  =  y/E  -f  e*,  where  ti  0. 

E(YU)  =  E(YLJ\Z]tt  =  0 )Pr(Zjtl  =  0)  +  E(YLJ\Zht  ?  0 )Pr(Zhl  /  0) 


=  (1  -  pl)(Ve  -  6l)  +  Pl{Ve~  (l) 

Ve  +  (1  -  pl)6l  +  pLC,L. 

Consider  first  the  term  (1  —  pi)bi 

(1  ~Pl)6l  =  f 


-aL+2  YE  2x  -(t-SE)! 

—f — — e  No  dx 

aL  \/27rAro 


f-aL+YE  2(y  +  y/E)  =£ 

=  (1  -  Pl)  _  /—Tr~e  -v° 

J -<*1  -\fE  \Z2ttNo 


dy 


1  zL—i.  +  'ffi* 

1  ~  ^  C  °  ~6  N° 


$ 

I'J 

$ 


m 


to 


t-ai+'/E  2'/E  -y2 

+{l  -  n)  U-sV  1 %e*d3 

Now  as  L  — »  oo  the  first  term  clearly  goes  to  zero.  Using  the  Mean  Value  Theorem 
[Baxt  76,  pg.  230]  from  calculus,  the  second  term  goes  to  zero  too. 

Now  consider  the  second  term  p^Ci- 


•Off.- 


-C.L+VE  2 (y  +  VE) 
•<*L-  '/E  V^(iV0  +  ^e 


eN°+  pl  dy 


L-ar-y/Sy 


=  ^-^=(e 


»l  —  e 


r-aL+\/E  2y/~E 


-(-ar+y/E)2 


r-aL- 

+/>L  / 

J —a  r.  — 


OfL-  VI  PlNq  +  iVj 


dy 


Now  if  inf  pL  were  greater  than  zero  then  clearly  each  term  goes  to  zero  with 
L 

increasing  L.  If  inf /,/?/,  =  0  then  each  term  is  of  the  form  p^c  where  c  <  1  and 
hence  each  of  the  above  terms  goes  to  zero  with  increasing  L. 


I  I'.  I 

4*  •  VV  >* 


*  1  *7 1 

"  iV  ' 
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